Download presentation
Presentation is loading. Please wait.
1
Introduction to Multiprocessor System-on-Chip
Prof. Jan Madsen Informatics and Mathematical Modeling Technical University of Denmark Richard Petersens Plads, Building 321 DK2800 Lyngby, Denmark
2
Embedded systems bit-pattern func mem CPU rom io (c) Jan Madsen if ...
then ... else ... for { ... ..} func bit-pattern mem CPU rom io (c) Jan Madsen
3
Embedded systems Systems which use a computer to perform a specific function, but are neither used nor perceived as a computer They are embedded within larger electronic devices Repeatedly carrying out a particular function Often completely unrecognized by the device’s user (c) Jan Madsen
4
Embedded systems design
Several design groups hardware software hardware model software model Separated validations validation hardware prototype software prototype Problems arise at a very late point in the design process Prototype realization (c) Jan Madsen
5
Principples of Codesign
CPU void UnitControl() { up = down = 0; open = 1; while (1) { while (req == floor); open = 0; if (req > floor) { up = 1;} else {down = 1;} while (req != floor); open = 1; delay(10); } SW synthesis void UnitControl() { up = down = 0; open = 1; while (1) { while (req == floor); open = 0; if (req > floor) { up = 1;} else {down = 1;} while (req != floor); open = 1; delay(10); } Interface synthesis ASIC HW synthesis (c) Jan Madsen
6
Overview Technology Codesign for speed-up Building sub-system
Processors IC fabric Codesign for speed-up component execution timing (SW and HW) Building sub-system Hardware/software partitioning Building system System-level issues of codesign (c) Jan Madsen
7
Software pe Elements of computation Store data Transform data
if ... then ... else ... for { ... ..} func Elements of computation Store data Transform data Move data (c) Jan Madsen
8
Processor Architecture components Processing elements – transform data
func if ... then ... else ... for { ... ..} Architecture components Processing elements – transform data Memories – store data Interconnect – move data (c) Jan Madsen
9
Processor: General Purpose
func if ... inst mem controller datapath data mem then ... else ... pc ir cu func for { ... ..} reg +/- * Availability Low cost (mass production) Simple design flow High flexibility (c) Jan Madsen
10
Processor: General Purpose - example
func if ... inst mem controller datapath data mem then ... else ... ir cu A[i] func for { ... ..} reg * pc +/- x = x + A[i] * p1 5 cycles (c) Jan Madsen
11
Processor: Custom (ASIC)
func controller datapath cu +/- * + mem if ... then ... else ... for { ... ..} High performance Low power Complex design flow No flexibility (c) Jan Madsen
12
Processor: Custom (ASIC) – example
func if ... controller datapath then ... else ... cu mem A[i] for { ... ..} * + +/- x = x + A[i] * p1 1 cycle (c) Jan Madsen
13
Processor: Semicustom (ASIP)
func inst mem controller datapath data mem func pc ir cu reg +/- + * if ... then ... else ... for { ... ..} Costumized datapath – 16, 8 or 4 bit Optimized for particular class of programs - MACC ”Simple” design flow High flexibility (c) Jan Madsen
14
Processor: Semicustom - example
func if ... inst mem controller datapath data mem then ... else ... ir cu func A[i] for { ... ..} reg * + pc +/- x = x + A[i] * p1 2 cycles (c) Jan Madsen
15
IC fabrics IC is an interconnection of transistors following one of several possible styles – fabrics The fabric defines how and when transistors are composed ”the material of processors” IC fabrics differ in terms of customizability and generality (c) Jan Madsen
16
IC fabrics: Custom Exact implementation of processor components
High NRE cost – mask set ~ 1M$ (c) Jan Madsen
17
IC fabrics: Semicustom
Several semicustom fabrics Library of standard cells Cell arrays (sea-of-gates) Most processing steps are pre manufactured (high volume) (c) Jan Madsen
18
IC fabrics: Programmable
Set of interconnected modules Set of modules programmed to implement different components FPGA Programmable logic modules, storage and interconnect (c) Jan Madsen
19
Chips: Implementing IC fabric
(c) Jan Madsen
20
Hardware/software codesign?
if ... then ... else ... for { ... ..} func Many possible mappings Processor may not exist yet! Exploring the design space Need to estimate (c) Jan Madsen
21
Hardware/Software Codesign
Optimizing Timing (high performance, hard deadlines) Area (cost) Power consumption Flexibility Reliability ... We will focus on timing (c) Jan Madsen
22
Processing element timing
Execution path Control data dependent Input data dependent Function implementation Component architecture Compiler or synthesis if ... then ... else ... for { ... ..} func (c) Jan Madsen
23
Formal execution path timing analysis
bi basic block or program segment tpe(bi,pej) execution time of bi on processing element pej c(bi) execution frequency of bi worst/best case timing bounds ) c(b ,pe ) (b F,pe ) t i I å × = ( pe j b1 if ... b3 b2 else { ... } then ... for { ... ..} b4 (c) Jan Madsen
24
Formal execution path timing analysis
,pe ) (b i t pe j + - * model + - * software b2 then ... + - * hardware (c) Jan Madsen
25
Memory models Access time Control overhead Burst access (packets)
Cache hit/miss time overhead Based on execution history PE D$ I$ Flash RAM SDRAM (c) Jan Madsen
26
Advanced architectures
Modern high performance processors includes architectural features which complicates timing analysis Dynamic instruction scheduling Speculative execution Though fast, it makes the processor very power hungry tight bounds on timing very difficult Computation less predictable Issues which are important for embedded systems (c) Jan Madsen
27
Building sub-systems Initial codesign problem
func if ... processor ASIC then ... else ... for { ... ..} Initial codesign problem Hardware/software partitioning the LYCOS cosynthesis tool Automatic partitioning from C (subset) and VHDL (single process) Developed at DTU (c) Jan Madsen
28
Hardware/Software partitioning
func b1 1 b2 if ... b3 2 then ... else ... 4 mapping b4 for { ... ..} 3 CPU ASIC CPU ASIC (c) Jan Madsen
29
Architectural choices
Which processor should be selected and how fast should it be? Which ASIC technology should be chosen and how fast should the ASIC be? How large an ASIC can we afford and which functions should it execute? How should the processor and ASIC communicate? (c) Jan Madsen
30
Partitioning Model BB SW HW Model Specification Determines granularity and simplifying assumptions w.r.t. communication, HW sharing, etc (c) Jan Madsen
31
Estimation S a t t H a t C a SW HW SW Estimator Lib Estimator HW Lib
Com a (c) Jan Madsen
32
Process communication
s(bi) sent data in bi r(bi) received data in bi c(bi) execution frequency of bi Communication time s(bi) and r(bi) determined by data volume Data encoding Communication protocol b1 if ... b2 b4 else { send(...); receive(...); ... } then ... for { ... ..} b3 (c) Jan Madsen
33
Solving the Partitioning Problem
SW HW 1 2 3 4 5 6 Just try all combinations... (c) Jan Madsen
34
Solving the Partitioning Problem
No communication interleaved exec. additive areas Interleaved communication additive areas Parallel execution non-additive areas SW HW 1 2 3 4 5 6 SW HW 1 2 3 4 5 6 1 2 6 7 HW 3 4 5 SW Knapsack Stuffing Large scale linear/nonlinear integer programming Heuristics needed! (c) Jan Madsen
35
LYCOS Design Flow (c) Jan Madsen Specification Functional Require
Translate Analysis CDFG SW SW Estim. Model HW Partitioning HW Estim. Model Comm. Comm. Estim. CDFG Model SW Comm. HW Synthesis Synthesis Synthesis Assembler SW/HW Netlist (c) Jan Madsen
36
Building Systems Different processing element types
Platform architectures are heterogeneous Different processing element types Different interconnection networks and communication protocols Different memory types Different scheduling and synchronization strategies M CoP P DSP (c) Jan Madsen
37
Managing HW platform complexity
Development of APIs to hide complexity from application programmer and improve portability Specialized RTOS to control resource sharing and interfaces aComplex multi-level HW/SW architecture (c) Jan Madsen
38
Software architecture
pe1 mem HW/SW Plattform application private application shared RTOS RTOS-APIs Hardware Software private private private drivers CPU I/O Int Bus- CTRL Timer Cache Periphery Bus ce1 (c) Jan Madsen
39
Platform design challenges
Integration Design process integration Heterogeneous component and language integration Design space exploration and optimization Verification (c) Jan Madsen
40
Complex run-time interdependencies
CoP Run-time dependencies of independent components via communication Influence on timing and power Need to handle resource sharing Process/task scheduling Communication scheduling Scheduling strategies (static, dynamic, time or priority driven) (c) Jan Madsen
41
Interdependency example
Complex non-functional interdependencies Periodic task executing on PE Task writes to bus at the end of each periodic execution Short execution time ahigh bus load long execution time alow bus load PE Local decision on improving performance may impact the global system performance (c) Jan Madsen
42
System-on-Chip challenge
processor memory io router (c) Jan Madsen
43
Network-on-Chip Multi-hop Concurrency Segmented communication
Multiple simultaneous communications a b c d M (c) Jan Madsen
44
Network-on-Chip Multi-hop Concurrency Sharing Segmented communication
Multiple simultaneous communications Sharing Quasi-simultaneous resource usage Multiple communication events occupying some or all resources in an interleaved fashion a b c d M (c) Jan Madsen
45
System-on-Chip design
1 3 4 2 os 3 4 2 mapping 1 os a b c L1 L2 L3 R1 R2 R3 a b c (c) Jan Madsen
46
New design paradigme ... Platform-based design platform design IP
specification platform IP re-design Mapping re-configure (c) Jan Madsen
47
thank you! (c) Jan Madsen
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.