Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical.

Similar presentations


Presentation on theme: "Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical."— Presentation transcript:

1 Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical and Computer Engineering RYERSON Polytechnic University Toronto, Ontario, CANADA

2 Digital signal processing (DSP); Digital signal processing (DSP); High performance control & Data acquisition; High performance control & Data acquisition; Digital communication and broadcasting; Digital communication and broadcasting; Cryptography and data security; Cryptography and data security; Process modeling and simulation. Process modeling and simulation. Application of parallel computing systems for data-flow tasks for data-flow tasks

3 Presentation of a data-flow task in the form of a data-flow graph Data In Data Out MO 1 - MO n - Macro-operators, e.g. digital filtering, FFT, matrix scaling, etc.

4 If the data-flow task is processed on conventional SISD architecture – processing time often cannot satisfy specification requirements; If the data-flow task is processed on conventional SISD architecture – processing time often cannot satisfy specification requirements; If the task is processed on SIMD or MIMD architectures - cost-effectiveness of these parallel computers strongly depend on the task algorithm or data structure. If the task is processed on SIMD or MIMD architectures - cost-effectiveness of these parallel computers strongly depend on the task algorithm or data structure. One of possible solutions to reach required cost-performance requirements is to develop a custom computing system where architecture covers data-flow graph of the task. One of possible solutions to reach required cost-performance requirements is to develop a custom computing system where architecture covers data-flow graph of the task. Correspondence between task and computing system architecture computing system architecture

5 1. Decrease of performance if task algorithm or data structure changes 1. Decrease of performance if task algorithm or data structure changes 2. No possibility for further modernization 2. No possibility for further modernization 3. High cost for multi-task or multi-mode custom computing systems. 3. High cost for multi-task or multi-mode custom computing systems. Limitations for the custom computing systems with fixed architecture

6 One of possible solutions – Reconfigurable parallel computing systems 1. Ability for custom configuration of each processing (functional) unit for a specific macro-operator (functional) unit for a specific macro-operator 2. Ability for custom configuration of information links between functional units; between functional units; The above features allow hardware customization for any data-flow graph and reconfiguration when task processing is completed.

7 Example of FPGA-based system with architecture configured for the data-flow task

8 Concept of Group Processor in the reconfigurable computing system Group Processor (GP) – a group of computing resources dedicated for the task and configured to reflect the task requirements. Group Processor (GP) – a group of computing resources dedicated for the task and configured to reflect the task requirements.

9 Group processor life- cycle 1. In the GP -links and functional units are configured before task processing configured before task processing 2. GP performs the task as long as it is necessary without interruption or time sharing with any without interruption or time sharing with any other task other task 3. After task completion all resources included in the GP can be reconfigured for any other task. the GP can be reconfigured for any other task.

10 The concept of Reconfigurable Group Organized computing system Host PC Virtual Bus Reconfigurable Interface Module (RIM) Functional Unit (FU) Reconfigurable Interface Module (RIM) Reconfigurable Interface Module (RIM) Functional Unit (FU) Data Stream I/O Input / Output data bus Configuration Bus

11 GP 2GP1 for Task 1 Virtual Bus Data in #2 FU 3FU 2FU 1FU 4 Data out #2 I/O Data in #1 Data out #1 Data out #3 GP 3 Parallel processing of different tasks on the separated Group Processors

12 MultiplierAdderFilter Data inMemory T0 T1 T2 TIME Concept of adaptation of the Group Processor architecture on the task architecture on the task Architecture-to-task adaptation for the GP = selection of resources configuration which: selection of resources configuration which: satisfies all requirements for task processing satisfies all requirements for task processing (e.g. performance, data throughput, reliability, etc.) (e.g. performance, data throughput, reliability, etc.) requires minimal hardware (I.e. logic gates) requires minimal hardware (I.e. logic gates)

13 Virtual Hardware Objects - the resource base of reconfigurable computing system of reconfigurable computing system For FPGA-based systems all architecture components (resources) can be presented as Virtual Hardware Objects (VHOs) described in one of the hardware description languages (for example VHDL or AHDL) For FPGA-based systems all architecture components (resources) can be presented as Virtual Hardware Objects (VHOs) described in one of the hardware description languages (for example VHDL or AHDL) Each resource can be presented in different variants – Ri,j, where i – indicates the type of resource (adder, multiplier, interface module, etc.) and j- indicates variant of resource presentation in the architecture (for example: 8-bit adder, 16-bit adder, etc.).

14 Concept of Architecture Configuration Graph (ACG) Adder Multiplier Adder Bus 123456789101112

15 Architecture Configurations Graph arrangement Local arrangement of Local arrangement of variants for each type of variants for each type of system resources system resources Adder 40 nS20 nS Processing time Architecture graph partial arrangement requires two procedures: 1. Local arrangement and procedures: 1. Local arrangement and 2. Hierarchic arrangement 2. Hierarchic arrangement

16 Multiplier Adder 80nS 40nS 20nS 123456 40nS 120 100 80 60 60 40 Adder Multiplier 123456 40nS20nS 120 80 60 100 60 40 20nS 80nS Hierarchical arrangement of system resources Arrangement criteria - K(Ri ) = [ T max(Ri) - Tmin (Ri) ] / (m i - 1) 120 - 60 120 - 100 K(Mult)= ----------- =30 > K(Adder)= ------------ = 20 3 - 1 2 - 1 3 - 1 2 - 1

17 Multiplier Adder 80nS 40nS 20nS 123456 120 100 80 60 60 40 20nS 40nS Selection of Group Processor architecture based on the arranged ACG Required processing time for the task Y = A* X + B is T < 80 nS Required performance GP-architecture = = Multiplier (#2) + Adder (#1) GP-architecture = = Multiplier (#2) + Adder (#1)

18 Number of experiments for GP-architecture selection N (GP opt )= ( n + 1 ) + log 2 (m 1 * m 2 *...m n ) n - number of resources (VHO) included in the n - number of resources (VHO) included in the architecture of the Group Processor architecture of the Group Processor m i - number of variants of each type of resources m i - number of variants of each type of resources Example: If n = 16 and m 1 = m 2 = … m n = 32 Total number of experiments (task run on estimated GP-architecture) N (GP opt) = 16 + 1 + 16 *5 = 97

19 Self-adaptation mechanism for FPGA-based reconfigurable data-flow computing systems Reconfigurable platform Data Source Performance Analyzer Host - PC Architecture generator Architecture generator Configuration Bus Library of Virtual Objects Hardware Objects Library of Virtual Objects Hardware Objects Architecture Selector Architecture Selector

20 First prototype of Adaptive Reconfigurable Group Organized (ARGO) computing platform

21 Input Data Streem - MPEG 2 Synchro-Signal Detect PCR - detectio n Null-packet analysis & removing Output frequency adjustment PCR re-stamping Reference Frequency Data Flow Graph for DVB MPEG2 processing Output MPEG 2 data stream

22 Architecture selection time for 6-mode DVB MPEG 2 stream processor 1. Average time for each architecture configuration- 7.18 mS 2. Average time for GP-architecture selection (for the specific mode) - 175.6 mS (for the specific mode) - 175.6 mS 3.Total time for architecture selections for all modes-1.054 S

23 Input Data -MPEG 2 stream Synchro-Signal Detect PCR - detectio n Null-packet analysis & removing Output frequency adjustment PCR re-stamping Reference Frequency Output MPEG 2 data stream FU #1 (8 bit In- port) Virtual bus (16 lines) FU # 1 FU # 2 FU #2 Out-port Hardware implementation of DVB MPEG 2 stream processor for mode 1 and 4

24 Input Data -MPEG 2 stream Synchro-Signal Detect PCR - detectio n Null-packet analysis & removing Output frequency adjustment PCR re-stamping Reference Frequency Output MPEG 2 data stream FU #1 (8 bit In- port) Virtual bus (16 lines) FU # 1 FU # 2 FU # 3 Out-port FU # 3 Hardware implementation of DVB MPEG stream processor for modes 2, 3, 5 and 6

25 Summary 1. Adaptive Reconfigurable Group Organized (ARGO) parallel computing system - FPGA-based configurable system with ability for adaptation on the task algorithm / data structure. 2. ARGO -system allows parallel processing of different data-flow tasks on the dynamically configured Group Processors (GPs), where each GP-architecture configuration corresponds to the algorithm / data specifics of the task assigned to this processor. 3. Above principles allows development of cost-effective parallel computing systems with programmable performance and reliability with minimum cost of hardware components and development time.


Download ppt "Optimization of Parallel Task Execution on the Adaptive Reconfigurable Group Organized Computing System Presenter: Lev Kirischian Department of Electrical."

Similar presentations


Ads by Google