Download presentation
Presentation is loading. Please wait.
1
1 SHARC ‘S’uper ‘H’arvard ‘ARC’hitecture Nagendra Doddapaneni ER hit HAR ect VARD ure SUP Arc
2
2 Overview Harvard Architecture Super Harvard Architecture TigerSHARC processor
3
3 Outline Background Harvard Architecture −Why? −What? Modern CPU Chip Design Super Harvard Architecture TigerSHARC Processor
4
4 Outline Background <- Harvard Architecture −Why? −What? Modern CPU Chip Design Super Harvard Architecture TigerSHARC Processor
5
5 Background von Neumann Architecture −Single storage for instructions and data Digital Signal Processors −Specialized microprocessor designed specifically for digital signal processing, generally in real time
6
6 Outline Background Harvard Architecture −Why? <- −What? Modern CPU Chip Design Super Harvard Architecture TigerSHARC Processor
7
7 Why Harvard Architecture ? von Neumann bottleneck (‘memory bound’) DSP applications In von Neumann architecture −Either reading an instruction −Or reading/writing from/to memory
8
8 Harvard Architecture (cont…)
9
9 Outline Background Harvard Architecture −Why? −What? <- Modern CPU Chip Design Super Harvard Architecture TigerSHARC Processor
10
10 What is Harvard Architecture ? Physically separate storage and signal pathways for instruction and data Next instruction fetched, when executing current instruction Program memory can be small and wide Data memory can be large and narrower
11
11 Outline Background Harvard Architecture −Why? −What? Modern CPU Chip Design <- Super Harvard Architecture TigerSHARC Processor
12
12 Modern CPU chip design Incorporate features from both architectures ‘On chip’ cache memory – divided into instruction cache and data cache. Harvard architecture used when CPU accesses cache memory. On a cache miss, ‘off chip’ main memory is accessed using von Neumann architecture. Main memory is not separated into data and instruction sections.
13
13 Outline Background Harvard Architecture −Why? −What? Modern CPU Chip Design Super Harvard Architecture <- TigerSHARC Processor
14
14 Super Harvard Architecture Cache used to store instructions, leaving both instruction bus and data bus free to fetch operands Harvard Architecture + cache = Extended Harvard Architecture or Super Harvard Architecture
15
15 Outline Background Harvard Architecture −Why? −What? Modern CPU Chip Design Super Harvard Architecture TigerSHARC Processor <-
16
16 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K buses DMA Controller Applications
17
17 TigerSHARC Processor Processor Architecture <- Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K buses DMA Controller Applications
18
18 TigerSHARC Processor Architecture 3 128-bit data buses 2 IALU’s 2 Computational Blocks − ALU ( Float and Integer ) − SHIFTER − MULTIPLIER − CLU
19
19 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation <- Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K buses DMA Controller Applications
20
20 TigerSHARC Instruction Parallelism and SIMD Operation Core can execute simultaneously one to four 32-bit instructions encoded in single instruction line (VLIW). Can execute in parallel? Depends on…. −Instruction line resources each requires −Source and Destination of registers used Supports SIMD operations through the use of both Computational Blocks in parallel. Each Computational Block can execute four 16-bit or eith 8-bit SIMD computations in parallel.
21
21 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU <- Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K buses DMA Controller Applications
22
22 TigerSHARC Integer ALU 31 32 bit general registers + 1 status register + 8 dedicated registers for circular buffers Performs integer ALU operations and data addressing ALU instructions: ADD, SUB, ARS, LRS (right shifts only), ROT (left and right), AND NOT, NOT, OR, XOR, ABS, MIN, MAX, CMP Status flags: zero (Z), negative (N), overflow (V), carry (C) Instruction conditions: EQ, LT, LE, NEQ, NLT, NLE Instruction options: unsigned (U), circular buffer (CB), bit reverse (BR), computed jump (CJMP) Address related operations: data address generation, circular buffers, bit reverse, UREG moves, DAB control.
23
23 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File <- −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K Buses DMA Controller Applications
24
24 TigerSHARC Computational Blocks X and Y Register File Register File Syntax −Each Block has 32x32 bit Data registers −Each register can store 4x8 bit, 2x16 bit or 1x32 bit words. −Registers can be combined into dual or quad groups. These groups can store 8, 16, 32, 40 or 64 bit words.
25
25 TigerSHARC Computational Blocks X and Y Register File Register File Syntax
26
26 Volatile registers in each block 24 Volatile Data registers in each block −XR0 – XR23 −YR0 – YR23 2 ALU summation registers in each block −XPR0, XPR1, YPR0, YPR1 5 MAC accumulate registers in each block −XMR0 – XMR3, YMR0 – YMR3 −XMR4, YMR4 – Overflow registers
27
27 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU <- −Multiplier −Shifter −CLU Program Sequencer I J and K buses DMA Controller Applications
28
28 TigerSHARC X and Y ALU 2x64 bit input paths 2x64 bit output paths 8, 16, 32, or 64 bit addition/subtractio n - Fixed-point 32 or 64 bit logical operations - fixed- point 32 or 40 bit floating-point operations
29
29 Sample ALU Instruction Example of 16 bit addition XYSR1:0 = R31:30 + R25:24 Performs addition in X and Y Compute Blocks
30
30 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier <- −Shifter −CLU Program Sequencer I J and K buses DMA Controller Applications
31
31 TigerSHARC Multiplier Operates on fixed, floating and complex numbers. Fixed-Point numbers −32x32 bit with 32 or 64 bit results −4 (16x16 bit) with 4x16 or 4x32 bit results Floating-Point numbers −32x32 bit with 32 bit result −40x40 bit with 40 bit result Complex Numbers −32x32 bit with 32 bit result −Fixed-point only Results stored in MR register
32
32 TigerSHARC Multiplier XR0 = R1*R2;; XR1:0 = R3*R5;; XMR1:0 = R3*R5;; //uses XMR4 overflow XR2 = MR3:2, XMR3:2 = R3*R5;; XR3:2 = MR1:0, XMR1:0 = R3*R5;; XFR0 = R1*R2;; XFR1:0 = R3:2*R5:4;; //40 bit multiply //32 bit mantissa
33
33 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter <- −CLU Program Sequencer I J and K data buses DMA Controller Applications
34
34 TigerSHARC Shifter Operates on one 64-bit, one or two 32-bit, two or four 16-bit, and four or eight 8-bit fixed-point operands Shifts and rotates bits manipulation operations, like bit set, clear, toggle and test Bit FIFO operations to support bit streams
35
35 TigerSHARC Processor Processor Architecture Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU <- Program Sequencer J and K data buses I bus – data bus
36
36 TigerSHARC CLU CLU instructions are designed to support different algorithms used for communications applications Algorithms supported are −Viterbi Decoding (minimal distance decoding algorithm) −Turbo-code Decoding (variant of Viterbi decoding) −De-spreading for Code Division Multiple Access (CDMA) systems (used for tasking a signal in wide Pseudo Noise spread bandwidth)
37
37 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer <- I J and K buses DMA Controller Applications
38
38 TigerSHARC Program Sequencer Supplies instruction addresses to memory IAB caches up to five fetched instruction lines waiting to execute It extracts an instruction line from IAB and distributes to appropriate core component for execution Determine flow control for instructions like JMP, CALL Reduce branch delays using branch prediction and BTB
39
39 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K buses <- DMA Controller Applications
40
40 TigerSHARC architecture at a glance
41
41 TigerSHARC Buses DRAM divided into 6 blocks of 4Mbits 6 blocks connect to four 128-bit wide internal buses through a crossbar connection Internal bus architecture provides a total memory bandwidth of 32Gbytes/sec Core and I/O can access −twelve 32-bit data words −four 32-bit instructions per cycle
42
42 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K buses DMA Controller <- Applications
43
43 TigerSHARC DMA Controller On-chip, with 14 DMA channels Provide zero-overhead data transfers Operates independently and invisibly to the DSP’s core
44
44 TigerSHARC Processor Processor Architecture Instruction Parallelism and SIMD Operation Integer ALU Computational blocks −X and Y Register File −X and Y ALU −Multiplier −Shifter −CLU Program Sequencer I J and K buses DMA Controller Applications <-
45
45 TigerSHARC Applications
46
46 References ANALOG DEVICES −http://www.analog.com/processors/processors/tigersharc/index.htmlhttp://www.analog.com/processors/processors/tigersharc/index.html −http://www.analog.com/processors/processors/sharc/index.htmlhttp://www.analog.com/processors/processors/sharc/index.html −http://www.analog.com/processors/resources/teachingResources.htmlhttp://www.analog.com/processors/resources/teachingResources.html ECE-ADI-PROJECT HOME PAGE −http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/index.htmlhttp://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/index.html −http://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/otherschoolsFrame.htmhttp://www.enel.ucalgary.ca/People/Smith/ECE-ADI-PROJECT/Index/otherschoolsFrame.htm
47
47 Summary What is Harvard Architecture? What is Super Harvard Architecture? TigerSHARC processor architecture How TigerSHARC is ‘faster’ for targeted DSP applications?
48
48 Questions? Thank You.
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.