Presentation on theme: "Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture."— Presentation transcript:
Design, Development and Validation Testing of a Versatile PLD Implementable Single-Chip Heterogeneous, Hybrid and Reconfigurable Multiprocessor Architecture (HDCA) By J. Robert (Bob) Heath**, Sridhar Hegde, Kanchan Bhide, Paul Maxwell, Xiaohui Zhao and Venugopal Duvvuri Department of Electrical and Computer Engineering University of Kentucky Lexington, Kentucky
Heath MAPLD 2005/2472 Abstract There appear to be an increasing number of real-time and non-real-time computer applications where the application may be described by process and/or data-flow graphs (from here on we use the term “process flow graphs”). Such applications include radar signal processing, sonar signal processing, various system simulation environments utilized within Computer Aided Design (CAD) software systems, communications signal processing, routing, collection and processing of data from multiple sensors/instruments, its storage, etc. For such applications, a first goal is the availability of a computer system/architecture platform which will allow an application described by a process flow graph of any topology to be mapped to and executed on the computer system/architecture. The application process flow graph could be single or multiple input/output and cyclic or acyclic. Processes are represented by nodes of the graphs. Further, it would be desirable for the computer system/architecture to be able to continue execution of the application with minimum interruption if the application process flow graph topology were to dynamically change during application execution. This goal is referred to as application level reconfigurability. A second goal for the same computer system/architecture would be that it have the ability to dynamically on-the-fly configure, move, or assign processors or other physical resources to application processes (and/or vice versa, the assignment of additional copies of a process to additional processors) that may need them at any time. This goal is referred to as node level reconfigurability. A third goal for the same computer system/architecture would be that it be a single-chip heterogeneous multiprocessor system and that it would have the capability to dynamically on-the-fly configure and reconfigure, if and when needed, single processor architectures within the overall multiprocessor architecture. We refer to this goal as processor architecture level reconfigurability. With proper Operating System (OS) and other system software support, a computer system/architecture platform which can meet these three goals should be able to execute a wide range of non-real and real-time applications described by process flow graphs of any topology in a fault tolerant manner. The contributions of this paper are in that it describes the research and development and current status of the development, testing and evaluation of such a computer system architecture. HDL “virtual prototype” functional and performance simulation testing results are shown for the architecture executing simple hypothetical applications. Future research, development and testing of the architecture is addressed. The described architecture paradigm and platform is known as a single-chip Hybrid Data/Command Driven Architecture (HDCA) system. A reconfigurable/dynamic production HDCA system would be implemented to Programmable Logic Devices (PLDs).
Heath MAPLD 2005/2473 Goals, Objectives and Functionality of HDCA System Applicable to a wide-range of applications, especially those modeled by process flow graphs. Heterogeneous Shared-Memory Model Multiprocessor Architecture. Allows a mix of Simple and Complex Special-Purpose and General-Purpose Processors. Single-Chip Architecture Implemented to Programmable Logic Device (PLD) Technology. May be used for real-time or non-real-time applications. Scalable architecture. Fault-tolerant architecture. May operate in a data-driven or command-driven environment at process level. Idea is for a small number of short control-tokens to flow through the architecture rather than more voluminous data. Dynamic/Reconfigurable at the “application level”. Dynamic/Reconfigurable at the “node level”. Dynamic/Reconfigurable at the “processor architecture level”.
Heath MAPLD 2005/2474 Application Description via Process Flow Graphs and Illustration of Dynamic/Reconfigurability at the “Application Level”
Heath MAPLD 2005/2475 Application Description via Process Flow Graphs and Illustration of Dynamic/Reconfigurability at the “Application Level” ( continued ) Another Process Flow Graph Describing an Application With a Different Topology.
Heath MAPLD 2005/2476 Illustration of Dynamic/Reconfigurability at the “Node Level” (Dynamic assignment of a process running on an overloaded Computing Element (CE) processor, to additional CE processors, to help-out the overloaded CE processor)
Heath MAPLD 2005/2477 Dynamic/Reconfigurability at the “Processor Architecture Level” Goal - Dynamically, while an application is running, be able to reconfigure (restructure) a Processor Architecture to enhance performance as dynamic changes may occur in application data and process algorithmic structure.
Heath MAPLD 2005/2478 HDCA System Organization and Architecture (High-Level Functional View)
Heath MAPLD 2005/2479 Architectural View Of a Current Single-Chip HDCA System Instantiation
Heath MAPLD 2005/24710 A Functional Level View of the CE Controller.
Heath MAPLD 2005/24711 Brief Overview of HDCA Functional Units Process Request Token (PRT) Mapper. –A Hardware Dynamic Load-Balancing System. –For a Process Requested by a Control Token, It Determines the CE Containing a Copy of the Process Where Wait-Time to Execute the Requested Process is Minimum. CE Input Queue Depth is Used as the Parameter to Determine Minimum Wait Time (Least Depth) to Execution. CE Queue Depth is Directly Proportional to Wait Time via Utilization of “Dummy Tokens”. –Detects Some Faults and System Failures.
Heath MAPLD 2005/24712 PRT_IN High Level Architectural Diagram of the Process Request Token (PRT) Mapper RAM TABLES AVAILAIBILITY ROUTER COMPARATORS
Heath MAPLD 2005/24713 Multifunctional Queue (Functionality: FIFO queue, simultaneous R/W, queue depth indication, signal when a programmable queue threshold depth is reached, switch order of any two entries, report input rate over a programmable time-interval, and report change in input rate over a programmable time-interval)
Heath MAPLD 2005/24714 Crossbar Interconnect Network (Variable-Priority Memory Contention Resolution Protocol- Priority Based on CE Queue Depths. Deepest Queue Depth Indicates “Most-Behind”.)
Heath MAPLD 2005/24715 HDCA System CEs (Processors) for Previously Shown Instantiation Memory Register Computer Architecture CE –For ALU Instructions, one operand in Memory and another in Register. –16-Bit Wide Words/Operands. –16 and 32-Bit Wide Instructions. –Sixteen Assembly Language Instructions. –I/O Structure. –Hardware Vectored Priority Interrupt System, etc.
Heath MAPLD 2005/24716 Memory Register Computer Architecture CE Organization
Heath MAPLD 2005/24717 DIVIDEND DIVISOR FROM SHARED DATA MEMORY
Heath MAPLD 2005/24718 Multiplier CE Organization/Architecture Pipelined Multiplier Multiplicand RegMultiplier Reg Mux Instruction Memory 8x16 Instr Mar Mux Data loc1Data loc2 Mux Controller R2 Mux adder
Heath MAPLD 2005/24719 Control-Token Formats Important token formats for the HDCA
Heath MAPLD 2005/24721 Interface Controller State Diagram (There is an Interface Controller Within the CE Controller Module of Each CE-Responsible for Control of HDCA)
Heath MAPLD 2005/24722 Hardware Description Language (HDL) Description of HDCA System VHDL Used as HDL. Mostly Behavioral and RTL Level Coding Style Used. Top-Down HDCA System Architecture Development and Design Style Used. Structural Bottom-Up Coding and Testing Style Used (Lower Level Functional Units First Described and Tested Before Being Integrated Into Higher Level Functional Units). Generic and Parameterized Coding Style Used When Applicable. Approximately 150 Pages (8.5” x 11”) of Single-Spaced 10-Point Font VHDL Code for Shown 5 CE Configuration.
Heath MAPLD 2005/24723 CAD Systems Used in Development and Testing of Single- Chip HDCA System (VHDL System Capture, Synthesis, Post-Synthesis Simulation Testing, Implementation, Post- Implementation Simulation Testing and Evaluation (Virtual Prototyping) Xilinx ISE CAD software tool set used for system capture, synthesis and implementation to FPGA technology (Xilinx Virtex 2 – XC2V8000 FPGA chip). Modelsim PE 5.7g was used as the HDL simulator. The host PC for the Xilinx and ModelSim CAD software was a high performance AMD Athlon processor running Windows XP, 32 bit edition at 2.16 GHz with 2GB of RAM. Input stimuli were added through the HDL bencher, where timing constraints could also be specified. Post-Implementation simulation (after Map, Place and Route) was carried out using ModelSim with test vector sets developed for different applications and after the Input ROM and the Instruction Memories of the Memory/Register Architecture CEs of an HDCA system have been initialized using the Memory Editor tool provided in Xilinx.
Heath MAPLD 2005/24724 HDCA System Testing, Evaluation and Validation via HDL Virtual Prototyping Example Simple Applications (All Successfully Executed by HDCA) 1.Acyclic Integer Manipulation Algorithm. 2.Acyclic Matrix Multiplication Algorithm 1. 3.Acyclic Matrix Multiplication Algorithm 2. 4.Acyclic Pipelined Integer Manipulation Algorithm. (Will View in Some Detail-Uses All Heterogeneous CEs of an Experimental HDCA System) 5.Cyclic Non-Deterministic Value Swap Application. 6.Other Applications.
Heath MAPLD 2005/24725 Acyclic Pipelined Integer Manipulation Algorithm (Will simultaneously execute two copies of algorithm, each with a different set of data) Process Flow graph for the Algorithm
Heath MAPLD 2005/24726 Input first five values of the ten values for first copy of the application - P1 5 Values of x”02” being input into shared data memory at consecutive locations starting from x”03”
Heath MAPLD 2005/24727 Process P7 for Copy 1 of Application – Displays Final result at address location x”0F” Unsigned 15 At x”0F” Last Instruction – Copy 1
Heath MAPLD 2005/24728 Conclusions and Future Research Conclusions –Validation of the Concept of a HDCA Accomplished via Virtual Prototyping – Parallel Single-Chip Multiprocessor System, Hybrid, Heterogeneous, Dynamic/Reconfigurable at Application and Node Levels, Implementable to PLD Technology, etc. –Scalable Architecture/Design at the same time also a SoC. –Can Simultaneously Execute Multiple Copies of an Application, each with different sets of data. –Potential for Execution of a Wide Range of Applications (Radar signal processing; communications (packet driven) processing; image (pixel driven) processing; satellite data-stream processing; embedded computing applications including control applications; collection, processing and storage of data from multiple sensors/instruments, etc) –Can Execute More Complex Applications. Future Research –Include More Complex Processors Into Experimental Model of HDCA In Addition to an Operating System (Linux, etc?). –Further Research Into Development and Refinement of the Concept of “Reconfigurability at the Processor Architecture Level”. –Identification and Adaptation to Several “Real Applications”!!