Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier.

Similar presentations


Presentation on theme: "Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier."— Presentation transcript:

1 Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier Department of Electrical and Computer Engineering University of Massachusetts, Amherst {alaffely, jliang, pjain, nweng, burleson, tessier} @ecs.umass.edu This material is based upon work supported by the National Science Foundation under Grant No. 9988238. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

2 Overview Motivation Video Processing Architecture Dynamic Power Management Core, Interconnect, and Clock

3 Problem Wireless video processing requires High throughput Low Power Flexible

4 System on a Chip Solutions Take advantage of parallelism Possible improved performance Allow use and reuse of existing integrated components If The application can be partitioned The appropriate architecture is used

5 Proposed Architecture: aSoC High throughput Heterogeneous processor elements Use the right tool for the job Fast and predictable interconnect Flexible Runtime reconfiguration of cores and interconnect Power consumption Implement power saving features in both cores and interconnect Use reconfiguration to dynamically control power consumption

6 aSoC: adaptive System on a Chip Tiled SoC architecture DCT VLE MemoryViterbiFIR EncryptControl Motion Estimation and Compensation

7 aSoC: adaptive System on a Chip Tiled SoC architecture Supports the use of independently developed heterogeneous cores Pick and place cores which best perform the given application Increase performance Save power Cores may be any number of tiles in size DCT VLE MemoryViterbiFIR EncryptControl Motion Estimation and Compensation

8 aSoC: adaptive System on a Chip Tiled SoC architecture Supports the use of independently developed heterogeneous cores Connected with an interconnect mesh Restricted to near neighbor communications Creates pipeline Decreases cycle time DCT VLE MemoryViterbiFIR EncryptControl Motion Estimation and Compensation

9 aSoC: adaptive System on a Chip Tiled SoC architecture Supports the use of independently developed heterogeneous cores Connected with a fixed interconnect mesh Using a communication interface (CI) to manage data Network port (Coreport) for each core Each CI uses a memory and FSM to repetitively process a predefined schedule of communications Crossbar DCT VLE MemoryViterbiFIR EncryptControl Motion Estimation and Compensation

10 Stream Control Instruction memory Holds the predetermined schedule of communications PC Selects and synchronizes the communications Decoder Sets crossbar Controller Sets PC Interprets incoming configuration commands Crossbar Any input to any set of outputs North South East West Core North South East West Core Decoder/Controller PC Inputs Outputs Instruction Memory Local Config.

11 Example: Communication Stream A-D Core CCore BCore A A given application requires periodic communications from Core A to Core C aSoC uses a prescheduled communication STREAM Core A places the data in a dedicated STREAM between the two tiles Core C pulls the data from that STREAM The tile to tile communication uses 3 cycles

12 Example: Stream CBA 1Core to East

13 Example: Stream Stream A-D CBA 2West to East

14 Example: Stream CBA West to Core3

15 Example: Stream Stream A-D CBA West to Core 1 3 2 Core to East West to East Loop Back

16 Static Scheduled Communications Creates system scalability by “eliminating” network congestion Many interconnect segments managed with time division multiplexing lots of Bandwidth Improves SoC performance by up to factor of 8 DCT VLE MemoryViterbiFIR EncryptControl Motion Estimation and Compensation

17 Power Consumption? Provide reconfiguration methods for cores and CI Develop programmable clocking systems at each tile

18 Power Aware Core Custom motion estimation core Choose search method Full search 960-600mW (bit width and pel sub-sampling) Spiral search 76mW Three step search 25mW Data taken with Synopsys TM Power Compiler at the RTL level

19 aSoC Support Multiple streams in and out through dedicated coreports Easy to manage on both sides of the port Schedule configuration streams in with the data Stream A: Input Frame Stream B: Configuration (Choose search mode and size) Stream C: Motion Vectors Motion Estimation Core in1in2out2out1 Stream A Stream B Stream C Coreports

20 Reconfigurable Interconnect P-frame I-frame MEMC - +  Input Frame DCT Input Frame DCT

21 aSoC Support Lumped ME, MC and Summation into one double core DCT Motion Estimation & Compensation

22 aSoC Support: P-Frame Input Frame (Stream A) DCT Motion Estimation & Compensation Difference Frame (Stream B)

23 aSoC Support: Schedule Change Input Frame (Stream A) DCT Motion Estimation & Compensation Difference Frame (Stream B) Configuration Streams (C & D)

24 aSoC Support: Schedule Change Input Frame (Stream A) DCT Motion Estimation & Compensation Difference Frame (Stream B) Configuration (Streams C) Schedule 1 Schedule 2 PC

25 aSoC Support: Schedule Change Input Frame (Stream A) DCT Motion Estimation & Compensation Difference Frame (Stream B) Configuration (Streams C) Schedule 1 Schedule 2 PC

26 aSoC Support: Schedule Change Input Frame (Stream A) DCT Motion Estimation & Compensation Configuration (Streams D) Schedule 1 Schedule 2 PC

27 aSoC Support: Schedule Change Input Frame (Stream A’) DCT Motion Estimation & Compensation Configuration (Streams D) Schedule 1 Schedule 2 PC

28 aSoC Support: I-Frame Input Frame (Stream A’) DCT Motion Estimation & Compensation OFF

29 Operating Frequency? Interconnect synchronized H-tree clock distribution Core frequencies depend on critical path Tile provides clock reference Coreport provides asynchronous boundary Dynamic core configuration requires dynamic clock configuration aSoC clock reference provides multiples of interconnect clock (… 4x, 2x, 1x, 0.5x, 0.25x, …) Configured through the tile controller

30 Mixed vs. Fixed Core Frequencies Cores not designed with clock gating Core power from Synopsys RTL simulation Interconnect from SPICE Assumes 10 cycle schedule, 4 pixels/word

31 Current Density and Clocking Red: fixed worst case clocking Short spikes of high current Green: optimal independent clocking Slow and low Optimal clocking eliminates current spikes (improved battery life) Deadline Process Start ME: Full Search ME: Spiral ME: Three Step Search DCT Time Current

32 Configuration Overhead Configuration adds up to 2 streams per tile Only 2 required for data Total BW =5xTxN 5 streams/(cycle,tile) T tiles N cycles in schedule Single tile can support up to 50 different streams in 10 cycle schedule DCT Transform Frame (Stream D) Input Frame (Stream B) Configuration Streams

33 Configuration Power Overhead Configuration streams used infrequently Once/Macro block or Once/Frame Architecture disables unused streams Data valid bit already used for flow control Only 4-9% of interconnect power is due to configuration streams

34 Conclusion aSoC supports dynamic power management with Reconfiguration Cores Interconnect Clocks Low configuration overhead in both Communication Bandwidth Power

35 Future Work Add reconfigurable voltage supplies at each tile Finish test chip Import larger applications

36 Questions

37 aSoC: adaptive System on a Chip DCTVLEMemoryViterbiFIREncryptControl Motion Estimation and Compensation Cores Interconnect Interface Tile

38 Example: Stream Stream A-D CBA

39 Partitioning Automated partitioning a non trivial problem For small signal processing systems user defined partitioning may be possible Key: Perfectly partitioning the system may not be possible How can the SoC mitigate the penalty?


Download ppt "Adaptive System on a Chip (aSoC) for Low-Power Signal Processing Andrew Laffely, Jian Liang, Prashant Jain, Ning Weng, Wayne Burleson, Russell Tessier."

Similar presentations


Ads by Google