Presentation on theme: "Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer."— Presentation transcript:
Clare Smtih SHARC Presentation1 The SHARC Super Harvard Architecture Computer
Clare Smtih SHARC Presentation 2 The SHARC Developed by Analog Devices Optimized for demanding DSP and imaging applications. 32 Bit floating point, with 40 bit extended floating point capabilities. Large on-chip memory. Ideal for scalable multi-processing applications.
3 Harvard Architecture Program memory can store data. Able to simultaneously read or write data at one location and get instructions from another place in memory. 2 buses 1Data memory bus. 2Program bus. Either two separate memories or a single dual-port memory.
Clare Smtih SHARC Presentation 4 Super Harvard Architecture Many processor employ Harvard Architecture by having two separate memories or caches integrated into the processor chip The SHARC is unique in that its internal memory is capable of holding a large program as well a large amount of data. This is what makes it SUPER!!!
Clare Smtih SHARC Presentation 5 DSP Digital Signal Processor. High speed, low overhead data movement and rapid computations required. Usually has a small on-board ROM, RAM and single cycle multiply. Designed to run single line, serial in, serial out, signal processing applications very fast.
Clare Smtih SHARC Presentation 6 DSP Computations The inner product of two vectors is a common computation for determining energy or correlation. The following C code is an example: for (n=0; n
Clare Smtih SHARC Presentation 7 SHARC DSP The SHARC incorporates features aimed at optimizing such loops. High-Speed Floating Point Capability Extended Floating Point These features are DSP specific. Meaning, when applied to a non-DSP application performance may not be as optimal.
8 Floating Point and Extended Floating Point The SHARC supports floating, extended- floating and non-floating point. No additional clock cycles for floating point computations. Data automatically truncated and zero padded when moved between 32-bit memory and internal registers. Not accurate enough for scientific algorithms. Excellent signal to noise ratio.
9 SHARCs Internal Memory Makes SHARC unique. Size Allows many complex functions to be preformed on-chip. Eliminating the need to move data between internal and external memory. Memory size is significantly larger then most other high speed computational devices. Dual-block, Dual-port Optimizes the Harvard Architecture by allowing the fetch of instructions while performing data memory accesses.
10 Multiply and Accumulate Instructions on the SHARC Like most DSPs the SHARC is able to compute a product and add the product to a running total in a single clock cycle. The SHARCs super instruction is that it can multiply and accumulate while adding, subtracting, or averaging data in two other registers. These instructions give the SHARC its 120 megaflop rating.
11 Zero Overhead Looping on the SHARC A single instruction outside the loop performs loop set-up. Informing the SHARC that there is a loop approaching. The instruction also includes the iteration count and termination condition. This causes the pipeline to remain full during loop execution and also allows the termination condition to be tested in parallel.
12 DAGs on the SHARC Data Address Generators are integer computation units that manage the indexing of registers. Allows the SHARC to to fetch a value and update the index value. If the updated value exceeds a limit, the DAB adjusts the index so that it wraps. This occurs in the same clock cycle as the read or write.
Clare Smtih SHARC Presentation 13 DAG Capabilities Circular Buffering Rather then actually moving data in and out of a vector, circular buffers are used. Updating the index modulo, the oldest entry can be conveniently replaced by the newest entry. Bit Reverse Addressing The bit pattern of a vector index is reversed. Done automatically by the SHARC. Required for Fast Fourier Transform (FFT), which is often critical to DSP applications.
Clare Smtih SHARC Presentation 14 SHARC DSP What Makes the SHARC unique? –It also has some features not related directly related to optimizing numeric computations. Pipelining Handling Branches Why has this not emerged sooner? –Technology has only recently become available to make it economical to integrate general single computing devices.
Clare Smtih SHARC Presentation 15 SHARCs Pipeline 3 stages 1Instruction Fetch 2Decode 3Execution Takes three clock cycles for an instruction to propagate through the pipeline. The processor execution speed is one instruction per clock cycle even though each instruction requires three clock cycles.
16 SHARCs Handling Branches Delayed Branching When a branch instruction is encountered the two instructions which have been loaded and decoded are executed before the branch. This keeps the pipeline full and avoids junking those two instructions and reloading the pipeline. Beneficial in situations such as a few instruction loops. When the ratio of wasted clock cycles to instructions is significant.
Clare Smtih SHARC Presentation 17 SHARCs Handling Branches Non-delayed Branching Traditional branching. If the pipeline cannot be reordered to use delayed branching, non-delayed branching is space saving. Uses only one word of storage. Although, it takes three cycles as the pipeline gets reloaded.
Clare Smtih SHARC Presentation 18 Multi-processing SHARC is uniquely equipped for multi- processing. Links to ports are very powerful multi- processing capabilities. Two main program models depending on the application. Adapts well to different multi-processing architectures.
Clare Smtih SHARC Presentation 19 Multi-processing SHARC Links SHARC has 6 link ports that can transport data at rates up to 40Mbytes/sec. Links designed for point-to-point connections. Data can be transmitted in either direction but not both simultaneously.
Clare Smtih SHARC Presentation 20 Multi-processing Program Model MIMD Multiple instruction, multiple data. Good for applications that require multiple instruction threads to execute concurrently. Processors operate individually. Each processor executes different code. Typically used for image reconstruction and multi-channel DSP.
Clare Smtih SHARC Presentation 21 Multi-processing Program Model SIMD Single instruction, multiple data. Works best when all processors execute identical instruction sequences. Do not require overhead for inter-processor synchronization. Typically used for synthetic aperture radar and automatic target recognition.
Clare Smtih SHARC Presentation 22 Multi-processing Architectures Cluster Design Groups of up to 6 in a cluster Most common for joining multiple SAHRC's All processors, global I/O and global memory connected to a common Cluster bus. Each SHARC can drive the bus.
23 Multi-processing Architectures Mesh Design All SHARCs joined by their link ports and are connected to a common bus. In SIMD mode one single master SHARC drives the bus. In MIMD mode mesh architecture cannot function if data is lager then on chip available memory. Advantageous scalability over a wider range of applications.
Clare Smtih SHARC Presentation 24 Summary of what makes the SHARC Super It performs excellently for DSP applications. Employs a Harvard Architecture with very large on chip memory. Respectable Megaflop rating. Its multiprocessing capabilities.
Clare Smtih SHARC Presentation 25 How optimal is the SHARC for non-DSP Applications? It is obviously geared for DSP applications. While it may fare better then other processors it is still behind those which are designed specifically for non-DSP applications.