Presentation is loading. Please wait.

Presentation is loading. Please wait.

Technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen1 B.D. Theelen Architecture Design of a Scalable Single-Chip Multi-Processor.

Similar presentations


Presentation on theme: "Technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen1 B.D. Theelen Architecture Design of a Scalable Single-Chip Multi-Processor."— Presentation transcript:

1 technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen1 B.D. Theelen Architecture Design of a Scalable Single-Chip Multi-Processor

2 technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen2 Overview Introduction MµP Features System Architecture Hardware RTOS Example Configuration Experimental Results Conclusions

3 technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen3 Introduction Architecture Platforms for Real-Time Embedded Systems Scalable, Customisable, Reusable Parallel Execution of Various Tasks Configurable Set of Application-Dedicated Processor Cores Customisability + Parallel + Scalable + Reusable  Configurable Set of Application-Dedicated Processor Cores (Scalable Number of Identical) General-Purpose Processor Core(s) Flexibility + (Parallel + Scalable) + Reusable  (Scalable Number of Identical) General-Purpose Processor Core(s) SoC technology enables embedding both on Single-Chip flexiblescalable Involves flexible and scalable Interconnects and Memory Architecture Examples: TriMedia, SpaceCake

4 technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen4 Real-Time Environment Architecture Platforms for Real-Time Embedded Systems Deadlines, Task Priorities, Impact of Overhead fast dealing with task priorities Involves fast Interconnects and Memory Architecture capable of dealing with task priorities Multi-Micro Processor (MμP) Scalable Number of Identical General-Purpose Master ProcessorsConfigurable Set of Shared Application-Dedicated Co- processors Combines Scalable Number of Identical General-Purpose Master Processors with Configurable Set of Shared Application-Dedicated Co- processors and Hardware RTOS Kernel a Hardware RTOS Kernel to reduce task switching overhead Software RTOS KernelMµP Hardware RTOS Kernel 100+ clocks16 clocks

5 technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen5 MµP Features True parallel execution of tasks independently –Master Processors execute tasks independently extendable –Instruction Set is extendable Only 1/16 th of instruction space is executed by Master Processors Remainder is split over up to 15 different Co-processor types Co-processor type determines actual use of instruction space scalable –Number of Co-processors of certain type is scalable On-chip RTOS Kernel –Transparent priority-based multi-tasking –Transparent priority-based multi-tasking over Master Processors fast –Hardware support for fast task switches –Communication and synchronisation –Communication and synchronisation between (local and remote) tasks (Counting) semaphores, mailboxes, pipes –Extended event handling –Extended event handling mechanism instead of interrupts Uses counting semaphores

6 technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen6 System Architecture Master Processors n n Shared Co- Processors m.y FPU m.y FPU 2.x LSU 2.x LSU m.1 FPU m.1 FPU 1 TCU 1 TCU 2.1 LSU 2.1 LSU Function Switch Result Switch Task Assignment Event Inputs Chip Boundary L1 I$ L2 I$ Arbiter Memory M  P Network MultiPort D$ Register D$ Task Control Unit Hardware RTOS Kernel

7 technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen7 Design Issues On-Chip Interconnects –Cyclic path of instructions and results non-blockingInterconnects are non-blocking accept resultsscoreboardingMaster processors accept results at all times and implement scoreboarding –Function Switch routes on co-processor type number high/low priorityFair arbitration with high/low priority based on task priority and request age –Result Switch routes on task number FCFS arbitration without priorities one clock –Perform routing functionality in one clock Memory Architecture –Separated instruction and data path Two-level instruction cache architecture with round-robin arbitration multi port data cacheShared multi port data cache = data cache with statistically multiplexed banks Round-robin arbitration between accesses for different paths No real cache coherency problems

8 technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen8 Hardware RTOS TCU Core Executive Function Switch Function Rx Result Switch Result Tx Master Processors Task Scheduler Task Scheduler Sorted Task List Sorted Task List Control Space Event Inputs Event Detect Timers TCU Network Management Link Network Switch MultiPort D$ Arbiter Task Admin Resource Admin Resource Data

9 technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen9 Design Issues Task Management creating, terminating, delaying, suspending restarting changing priority –Commands for creating, terminating, delaying, suspending and restarting tasks and for changing priority –Tasks of equal priority time share master processors available to them –Task switching accelerated by specialised cache storing volatile contexts Transparent Communication activating, deactivating, reading writing –Commands for activating, deactivating, reading and writing resources –Counting semaphores, mailboxes and pipes in hardware –Network Manager shields tasks from MµP network –Tasks can access any resource in the MµP network Extended Event Handling activatingdeactivating –Commands for activating and deactivating event inputs –Event inputs are coupled to counting semaphores –Involved semaphore might not be in same MµP where the task resides

10 technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen10 Example Configuration (Mini MµP) By V.R. Suárez Two 8048 ISA compatible Master Processors 8048 compatible I/O and Timers in Co-Processors 1 clock Function Switch and Result Switch On-chip 2kB Instruction ROM and 1kB Data RAM 1 clock Register D$ enabling Task Switches in 1 clock TCU Co-Processor -15 user-definable tasks -32 binary semaphores -Timers and Interrupts supported as events for predefined tasks 1 clock -all commands executed in 1 clock

11 technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen11 Experimental Results (Mini MµP) IDaSSMini MµP designed using IDaSS –Interactive Design and Simulation System –Automatic generation of synthesisable VHDL or Verilog Mini MµP implemented in Xilinx Spartan-II 200 FPGA –Uses 42% of memory area and 83% of gate area –Total gate count of 141k –Runs at 25 Mhz (expect over 30Mhz for optimised version) –Critical path is 14 gates (in Master Processor core) –Next critical path in TCU Co-Processor

12 technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen12 Conclusions Multi Micro Processor (MµP) Architecture –Scalable Single-Chip Multi-Processor –Intended for Real-Time Embedded Systems –On-chip RTOS Kernel with hardware support for fast Task Switches Design issues –On-chip Interconnects –Memory Architecture –Hardware RTOS Task Management Transparent Communication Extended Event Handling Results –Mini version of MµP with two 8048 ISA compatible Master Processors


Download ppt "Technische universiteit eindhoven 4 September 2002www.ics.ele.tue.nl/~btheelen1 B.D. Theelen Architecture Design of a Scalable Single-Chip Multi-Processor."

Similar presentations


Ads by Google