Design and Implementation of a NoC-Based Cellular Computational System By: Shervin Vakili Supervisors: Dr. Sied Mehdi Fakhraie Dr. Siamak Mohammadi February.

Slides:



Advertisements
Similar presentations
Computer Science and Engineering Laboratory, Transport-triggered processors Jani Boutellier Computer Science and Engineering Laboratory This.
Advertisements

Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
Particle Swarm Optimization for Run-Time Task Decomposition and Scheduling in Evolvable MPSoC Shervin Vakili, Sied Mehdi Fakhraie, Siamak Mohammadi, and.
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
Embedded Algorithm in Hardware: A Scalable Compact Genetic Algorithm Prabhas Chongstitvatana Chulalongkorn University.
TIE Extensions for Cryptographic Acceleration Charles-Henri Gros Alan Keefer Ankur Singla.
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Octavian Cret, Kalman Pusztai Cristian Vancea, Balint Szente Technical University of Cluj-Napoca, Romania CREC: A Novel Reconfigurable Computing Design.
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
Embedded Software Optimization for MP3 Decoder Implemented on RISC Core Yingbiao Yao, Qingdong Yao, Peng Liu, Zhibin Xiao Zhejiang University Information.
Extensible Processors. 2 ASIP Gain performance by:  Specialized hardware for the whole application (ASIC). −  Almost no flexibility. −High cost.  Use.
Parallelized Evolution System Onur Soysal, Erkin Bahçeci Erol Şahin Dept. of Computer Engineering Middle East Technical University.
Trevor Burton6/19/2015 Multiprocessors for DSP SYSC5603 Digital Signal Processing Microprocessors, Software and Applications.
1 Multi - Core fast Communication for SoPC Multi - Core fast Communication for SoPC Technion – Israel Institute of Technology Department of Electrical.
Modern trends in computer architecture and semiconductor scaling are leading towards the design of chips with more and more processor cores. Highly concurrent.
“Dunarea de Jos” University of Galati-Romania Faculty of Electrical & Electronics Engineering Dep. of Electronics and Telecommunications Assoc. Prof. Rustem.
Mahesh Sukumar Subramanian Srinivasan. Introduction Face detection - determines the locations of human faces in digital images. Binary pattern-classification.
Using Programmable Logic to Accelerate DSP Functions 1 Using Programmable Logic to Accelerate DSP Functions “An Overview“ Greg Goslin Digital Signal Processing.
GallagherP188/MAPLD20041 Accelerating DSP Algorithms Using FPGAs Sean Gallagher DSP Specialist Xilinx Inc.
Joram Benham April 2,  Introduction  Motivation  Multicore Processors  Overview, CELL  Advantages of CMPs  Throughput, Latency  Challenges.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.
Genetic Algorithm.
1 Miodrag Bolic ARCHITECTURES FOR EFFICIENT IMPLEMENTATION OF PARTICLE FILTERS Department of Electrical and Computer Engineering Stony Brook University.
COMPUTER SCIENCE &ENGINEERING Compiled code acceleration on FPGAs W. Najjar, B.Buyukkurt, Z.Guo, J. Villareal, J. Cortes, A. Mitra Computer Science & Engineering.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
FPGA FPGA2  A heterogeneous network of workstations (NOW)  FPGAs are expensive, available on some hosts but not others  NOW provide coarse- grained.
Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Hardware Implementation of a Memetic Algorithm for VLSI Circuit Layout Stephen Coe MSc Engineering Candidate Advisors: Dr. Shawki Areibi Dr. Medhat Moussa.
J. Christiansen, CERN - EP/MIC
Page 1 Reconfigurable Communications Processor Principal Investigator: Chris Papachristou Task Number: NAG Electrical Engineering & Computer Science.
Embedding Constraint Satisfaction using Parallel Soft-Core Processors on FPGAs Prasad Subramanian, Brandon Eames, Department of Electrical Engineering,
Genetic Algorithms Introduction Advanced. Simple Genetic Algorithms: Introduction What is it? In a Nutshell References The Pseudo Code Illustrations Applications.
“Politehnica” University of Timisoara Course No. 2: Static and Dynamic Configurable Systems (paper by Sanchez, Sipper, Haenni, Beuchat, Stauffer, Uribe)
A Configurable High-Throughput Linear Sorter System Jorge Ortiz Information and Telecommunication Technology Center 2335 Irving Hill Road Lawrence, KS.
MAPLD 2005/254C. Papachristou 1 Reconfigurable and Evolvable Hardware Fabric Chris Papachristou, Frank Wolff Robert Ewing Electrical Engineering & Computer.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
Rinoy Pazhekattu. Introduction  Most IPs today are designed using component-based design  Each component is its own IP that can be switched out for.
EKT303/4 PRINCIPLES OF PRINCIPLES OF COMPUTER ARCHITECTURE (PoCA)
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Lecture 12: Reconfigurable Systems II October 20, 2004 ECE 697F Reconfigurable Computing Lecture 12 Reconfigurable Systems II: Exploring Programmable Systems.
COARSE GRAINED RECONFIGURABLE ARCHITECTURES 04/18/2014 Aditi Sharma Dhiraj Chaudhary Pruthvi Gowda Rachana Raj Sunku DAY
Lab 2 Parallel processing using NIOS II processors
Coevolutionary Automated Software Correction Josh Wilkerson PhD Candidate in Computer Science Missouri S&T.
Survey of multicore architectures Marko Bertogna Scuola Superiore S.Anna, ReTiS Lab, Pisa, Italy.
Reconfigurable architectures ESE 566. Outline Static and Dynamic Configurable Systems –Static SPYDER, RENCO –Dynamic FIREFLY, BIOWATCH PipeRench: Reconfigurable.
By Islam Atta Supervised by Dr. Ihab Talkhan
1 Advanced Digital Design Reconfigurable Logic by A. Steininger and M. Delvai Vienna University of Technology.
Evolving, Adaptable Visual Processing System Simon Fung-Kee-Fung.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
Case Study: Implementing the MPEG-4 AS Profile on a Multi-core System on Chip Architecture R 楊峰偉 R 張哲瑜 R 陳 宸.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Low Power IP Design Methodology for Rapid Development of DSP Intensive SOC Platforms T. Arslan A.T. Erdogan S. Masupe C. Chun-Fu D. Thompson.
Evolvable Hardware (EHW) Topic Review S08*ENGG*6530 Antony Savich.
Research Interests  NOCs – Networks-on-Chip  Embedded Real-Time Software  Real-Time Embedded Operating Systems (RTOS)  System Level Modeling and Synthesis.
Multi-cellular paradigm The molecular level can support self- replication (and self- repair). But we also need cells that can be designed to fit the specific.
EVOLUTIONARY SYSTEMS AND GENETIC ALGORITHMS NAME: AKSHITKUMAR PATEL STUDENT ID: GRAD POSITION PAPER.
Reconfigurable Computing1 Reconfigurable Computing Part II.
Presenter: Darshika G. Perera Assistant Professor
Lynn Choi School of Electrical Engineering
Dynamo: A Runtime Codesign Environment
For Massively Parallel Computation The Chaotic State of the Art
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Programming in C with MPI and OpenMP
A High Performance SoC: PkunityTM
Department of Electrical Engineering Joint work with Jiong Luo
Parallel Programming in C with MPI and OpenMP
The University of Adelaide, School of Computer Science
Presentation transcript:

Design and Implementation of a NoC-Based Cellular Computational System By: Shervin Vakili Supervisors: Dr. Sied Mehdi Fakhraie Dr. Siamak Mohammadi February 09, 2009

2 Outline  Introduction and Motivations  Basics of Evolvable Multiprocessor System (EvoMP)  EvoMP Operational View  EvoMP Architectural View  Simulation and Synthesis Results  Summary

3 Introduction and Motivations Basics of Evolvable Multiprocessor System (EvoMP) EvoMP Operational View EvoMP Architectural View Simulation and Synthesis Results Summary

4 Introduction and Motivations (1)  Computing systems have played an important role in advances of human life in last four decades.  Number and complexity of applications are countinously increasing.  More computational power is required.  Three main hardware design approaches: -ASIC (hardware realization) -Reconfigurable Computing -Processor-Based Designs (software realization) Flexibility Performance

5 Introduction and Motivations (2)  Microprocessors are the most pupular approach. -Flexibility and reprogramability -Low performance  Architectural techniques to improve processor performance: -Pipeline, out of order execution, Super Scalar, VLIW, etc.  Seems to be saturated in recent years.

6 Introduction and Motivations (3)  Emerging trends aim to achieve: -More performance -Preserving the classical software development process. [1]

7 Why Multi-Proseccor?  One of the main trends is to increase number of processors.  Uses Thread-level Parallelism (TLP)  Similarity to single-processor: - Short time-to market - Post-fabricate reusability - Flexibility and programmability  Moving toward large number of simple processors on a chip.

8 Number of Processing Cores in Different Products [3] [3]

9 MPSoC Development Challenges (1)  MP systems faces some major challenges.  Programming models: -MP systems require concurrent software. -Concurrent software development requires two operations: -Decomposition of the program into some tasks -Scheduling the tasks among cooperating processors -Both are NP-complete problems -Strongly affects the performance

10 MPSoC Development Challenges (2) -Two main solutions: 1. Software development using parallel programming libraries. -e.g. MPI and OpenMP -Manually by the programmer. -Requires huge investment to re-develop existing software. 2. Automatic parallelization at compile-time -Does not require reprogramming but requires re-compilation. -Compiler performs both Task decomposition and scheduling.

11 MPSoC Development Challenges (3)  Control and Synchronization -To Address inter-processor data dependencies  Debugging -Tracking concurrent execution is difficult. -Particularly in heterogeneous architecture with different ISA processors.

12 MPSoC Development Challenges (4)  All MPSoCs can be divided into two categories: -Static scheduling -Task scheduling is performed before execution. -Predetermined number of contributing processors. -Has access to entire program. -Dynamic scheduling -A run-time scheduler (in hardware or OS) performs task scheduling. -Does not depend on number of processors. -Only has access to pending tasks and available resources.

13 Introduction and Motivations Basics of Evolvable Multiprocessor System EvoMP Operational View EvoMP Architectural View Simulation and Synthesis Results Summary

14 Proposal of Evolvable Multi-processor System (1)  This thesis introduces a novel MPSoC -Uses evolutionary strategies for run-time task decomposition and scheduling. -Is called EvoMP (Evolvable Multi-Processor system).  Features: -Can directly execute classical sequential codes on MP platform. -Uses a hardware evolutionary algorithm core to perform run time task decomposition and scheduling. -Distributed control and computing -Flexibility -NoC-Based, 2D mesh, and homogeneous

15  All computational units have one copy of the entire program  EvoMP architecture exploits a hardware evolutionary core -to generates a bit-string (chromosome). -This bit-string determines the processor which is in charge of executing each instruction.  Primary version of EvoMP uses a genetic algorithm core. Proposal of Evolvable Multi-processor System (2)

16 Target Applications  Target Applications: -Applications, which perform a unique computation on a stream of data, e.g.: -Digital signal processing -Packet processing in network applications -Huge sensory data processing -…

17 Streaming Applications Code Style  Streaming programs have two main parts: -Initialization -Infinite (or semi-infinite) Loop ;Initial 1- MOV R1, 0 2- MOV R2, 0 L1:;Loop 3- MOV R1, Input 4- MUL R3, R1, Coe1 5- MUL R4, R2, Coe2 6- ADD R1, R3, R4 7- MOV Output, R1 8- MOV R1, R2 9- Genetic 10-JUMP L1 Two-Tap FIR Filter

18 Introduction and Motivations Basics of Evolvable Multiprocessor System (EvoMP) EvoMP Operational View EvoMP Architectural View Simulation and Synthesis Results Summary

19 EvoMP Top View  Genetic core produces a bit- string (chromosome) -Determines the location of execution of each instruction SW00 P-01 SW01 SW10SW11 P-00 P-11P-10 Genetic Core 1- MOV R1, 0 2- MOV R2, 0 L1:;Loop 3- MOV R1, Input 4- MUL R3, R1, Coe1 5- MUL R4, R2, Coe2 6- ADD R1, R3, R4 7- MOV Output, R1 8- MOV R1, R2 9- JUMP L1 1- MOV R1, 0 2- MOV R2, 0 L1:;Loop 3- MOV R1, Input 4- MUL R3, R1, Coe1 5- MUL R4, R2, Coe2 6- ADD R1, R3, R4 7- MOV Output, R1 8- MOV R1, R2 9-JUMP L1 1- MOV R1, 0 2- MOV R2, 0 L1:;Loop 3- MOV R1, Input 4- MUL R3, R1, Coe1 5- MUL R4, R2, Coe2 6- ADD R1, R3, R4 7- MOV Output, R1 8- MOV R1, R2 9- JUMP L1 1- MOV R1, 0 2- MOV R2, 0 L1:;Loop 3- MOV R1, Input 4- MUL R3, R1, Coe1 5- MUL R4, R2, Coe2 6- ADD R1, R3, R4 7- MOV Output, R1 8- MOV R1, R2 9- JUMP L1 Chromosome: …11

20 How EvoMP Works? (1)  Following process is repeated in each iteration: -At the beginning of each iteration: -genetic core generates and sends the bit-string (chromosome) to all processors. -Processors execute this iteration with the determined decomposition and scheduling scheme. -A counter in genetic core counts number of spent clock cycles. -When all processors reached end of the loop: -The genetic core uses the output of this counter as the fitness value.

21 How EvoMP Works? (2)  Three main working states -Initialize: -Just in first population -Genetic core generates random particles. -Evolution: -Uses recombination to produce new populations. -When the termination condition is met, system goes to final state. -Final: -The best chromosome is used as constant output of the genetic core. -When one of the processors becomes faulty, the system returns to evolution stage InitializeEvolutionFinal Fault detected Terminate

22 How Chromosome Codes the Scheduling Data? (1)  Each chromosome consists of some small words (gene).  Each word contains two fields: -A processor number -Number of instructions

23 10 How Chromosome Codes the Scheduling Data (2)  Assume that we have a 2X2 mesh # of Instructions Chromosome MOV R1, 0 2- MOV R2, 0 L1:;Loop 3- MOV R1, Input 4- MUL R3, R1, Coe1 5- MUL R4, R2, Coe2 6- ADD R1, R3, R4 7- MOV Output, R1 8- MOV R1, R2 9- GENETIC 10-JUMP L1 Word1 Word2 Word3 Word4

24 Data Dependency Problem  Data dependencies are the main challenge.  Must be detected dynamically at run-time.  Is addressed using: -Particular machine code style -Architectural techniques

25 EvoMP Machine Code Style  Source operands are replaced by line-number of the most recent instructions that has changed it (ID).  Will enormously simplify dependency detection. 10. ADD R1,R2,R3 ; R3=R1+R2 11. AND R2,R6,R7 ; R7=R2&R6 12. SUB R7,R3,R4 ; R4=R7-R3 12. SUB (11), (10), R4

26 Introduction and Motivations Basics of Evolvable Multiprocessor System (EvoMP) EvoMP Operational View EvoMP Architectural View Simulation and Synthesis Results Summary

27 Architecture of each Processor  Number of FUs is configurable.  Homogeneous or heterogeneous policies can be used for FUs.  Supports out of order execution.  First free FU grabs the instruction from Instr bus (Daisy Chain).

28 Fetch_Issue Unit  PC1-Instr bus is used for executive instructions.  PC2-Invalidate_Instr bus is used for data dependency detection.

29 Functional Unit  Can be configured to execute different operations: -Arithmetic Operations -Add -Sub -Shift/Rotate Right/Left -Multiply: Add and shift -Logical Operations

30 Genetic Core SW00 Cell-01 SW01 SW10SW11 Cell-00 Cell-11Cell-10 Genetic Core  Population size and mutation rate are configurable.  Elite count is constant and equal to two in order to reduce the hardware complexity

31 EvoMP Challenges  Current versions uses centralized memory unit. -In “00” address. -This address does not contain computational circuits. -Major issue for scalability  Search space of genetic algorithm is very large. -Exponentially grows up with linear increase of number of processors.

32 PSO Core [8]

33 Introduction and Motivations Basics of Evolvable Multiprocessor System (EvoMP) EvoMP Operational View EvoMP Architectural View Simulation and Synthesis Results Summary

34 Configurable Parameters  There are some configurable parameters in EvoMP: -Word-length of the system -Size of the mesh (number of processors) -Flit length: bit-length of NoC switch links -Population size -Crossover rate

35 Simulation Results  Two sets of applications are used for performance evaluation. -Some DSP programs -Some sample neural Network  Two other decomposition and scheduling methods are implemented enabling the comparison - Static Decomposition Genetic Scheduler (SDGS) -Decomposition is performed statically i.e. tasks are predetermined manually -Genetic core only specifies scheduling scheme -Static Decomposition First Free Scheduler (FF) -Assigns the first task in job-queue to the first free processor in the system

36 16-Tap FIR Filter  Parameters: -16 bit mode -Population size=16 -Crossover Rate=8 -NoC connection width=16  74 Instructions  16 multiplication Best fitness shows number of clock cycles required to execute one iteration using the best particle which has been found yet.

37 8-Point DCT  Parameters: -16 bit mode -Population size=16 -Crossover Rate=8 -NoC connection width=16  88 Instructions  32 multiplication

38 16-point DCT  Parameters: -16 bit mode -Population size=16 -Crossover Rate=6 -NoC connection width=16  320 Instructions  128 multiplication

39 5x5 Matrix Multiplication  Parameters: -16 bit mode -Population size=16 -Crossover Rate=6 -NoC connection width=16  406 Instructions  125 multiplication

40 FIR-16DCT-8DCT-16MATRIX-5x5 Number of Instructions Number of Multiply Instructions x2 mesh (One Proc.) In all three schemes Fitness (clock cycles) Speed-up x3 mesh Main Design Fitness (clock cycles) Speed-up Evolution Time (us) SDGSFitness (clock cycles) Speed-up Evolution Time (us) First FreeFitness (clock cycles) Speed-up x2 mesh Main Design Fitness (clock cycles) Speed-up Evolution Time (us) SDGSFitness (clock cycles) Speed-up Evolution Time (us) First FreeFitness (clock cycles) Speed-up

41 FIR-16DCT-8DCT-16MATRIX-5x5 Number of Instructions Number of Multiply Instructions x2 mesh (One Proc.) In all three schemes Fitness (clock cycles) Speed-up x3 mesh Main Design Fitness (clock cycles) Unevaluated Speed-up Evolution Time (us) SDGSFitness (clock cycles) Speed-up Evolution Time (us) First FreeFitness (clock cycles) Speed-up

42 Neural Network Case Study # of Instr. # of Multip lies 1x2 mesh1x3 mesh2x2 mesh2x3 mesh FitnessSpeed- up FitnessSpeed- up TimeFitnessSpeed- up TimeFitnessSpeed- up Time

43 Fault Tolerance Results  When a fault is detected in a processor, the evolutionary core eliminates it of contribution in next iterations.  It also returns to evolution stage to find the suitable solution for the new situation.  Best obtained fitness in a 2x3 EvoMP for 16-point DCT program is evaluated.  Faults are injected into 010, 001 and 101 processors in us, us and us respectively

44 Genetic vs. PSO # of Instr. # of Multi -plies Particle length (bits) 1x2 mesh 1x3 mesh2x2 mesh2x3 mesh BothGeneticPSOGeneticPSOGeneticPSO FitTimeFitTimeFitTimeFitTimeFitTimeFitTime FIR unevaluated DCT DCT MAT-5x  Population size in both experiments is 16

45 Synthesis Results using Sinplify Pro.  Synthesis results on VIRTEX II (XC2V3000) FPGA using Sinplify Pro. NoC switchGenetic CorePSO CoreMMUProcessorTotal System Area (Total LUTs) 729 (2%)1864 (6%)1642 (5%)3553 (12%)4433 (15%)20112 (70%) Max Freq. (MHz)

46 Introduction and Motivations Basics of Evolvable Multiprocessor System (EvoMP) EvoMP Operational View EvoMP Architectural View Simulation and Synthesis Results Summary

47 Summary  The EvoMP which is a novel MPSoC system was studied.  EvoMP exploits evolvable strategies to perform run-time task decomposition and scheduling.  EvoMP does not require concurrent codes because it can parallelize th sequential codes at run-time.  Exploits particular and novel processor architecture in order to address data dependency problem.  Experimental results confirm the applicability of EvoMP novel ideas.

48 Main References [1] N. S. Voros and K. Masselos, System Level Design of Reconfigurable Systems-on-Chip. Netherlands: Springer, [2] G. Martin, “ Overview of the MPSoC design challenge, ” Proc. Design and Automation Conf., July 2005, pp [3] S. Amarasinghe, “Multicore programming primer and programming competition,” class notes for 6.189, Computer Architecture Group, Massachusetts Institute of Technology, Available: [4] M. Hubner, K. Paulsson, and J. Becker, “ Parallel and flexible multiprocessor system-on-chip for adaptive automo ­ tive applications based on Xilinx MicroBlaze soft-cores, ” Proc. Intl. Symp. Parallel and Distributed Processing, [5] D. Gohringer, M. Hubner, V. Schatz, and J. Becker, “ Runtime adaptive multi-processor system- on-chip: RAMP ­ SoC, ” Proc. Intl. Symp. Parallel and Distributed Processing, April 2008, pp [6] A. Klimm, L. Braun, and J. Becker, “ An adaptive and scalable multiprocessor system for Xilinx FPGAs using minimal sized processor cores, ” Proc. Symp. Parallel and Distributed Processing, April 2008, pp [7] Z.Y. Wen and Y.J. Gang, “ A genetic algorithm for tasks scheduling in parallel multiprocessor systems, ” Proc. Intl. Conf. Machine Learning and Cybernetics, Nov. 2003, pp [8] A. Farmahini-Farahani, S. Vakili, S. M. Fakhraie, S. Safari, and C. Lucas, “Parallel scalable hardware implementation of asynchronous discrete particle swarm optimization,” Elsevier J. of Engineering Applications of Artificial Intelligence, submitted for publication.

49 Main References (2) [9] A. A. Jerraya and W. Wolf, Multiprocessor Systems-on-Chips. San Francisco: Morgan Kaufmann Publishers, [10] A.J. Page and T.J. Naughton, “ Dynamic task scheduling using genetic algorithms for heterogeneous distributed computing, ” Proc. Intl. Symp. Parallel and Distributed Processing, April 2005, pp [11] E. Carvalho, N. Calazans, and F. Moraes, “ Heuristics for dynamic task mapping in NoC based heterogeneous MPSoCs ”, Proc. Int. Rapid System Prototyping Workshop, pp , [12] R. Canham, and A. Tyrrell, “ An embryonic array with improved efficiency and fault tolerance, ” Proc. NASA/DoD Conf. on Evolvable Hardware, July 2003, pp [13] W. Barker, D. M. Halliday, Y. Thoma, E. Sanchez, G. Tempesti, and A. Tyrrell, “ Fault tolerance using dynamic reconfiguration on the POEtic Tissue, ” IEEE Trans. Evolutionary Computing, vol. 11, num. 5, Oct. 2007, pp

50 Related Publications  Journal: 1.S. Vakili, S. M. Fakhraie, and S. Mohammadi, “EvoMP: a novel MPSoC architecture with evolvable task decompo­sition and scheduling,” Submitted to IET Comp. & Digital Tech., (Under Revision). 2.S. Vakili, S. M. Fakhraie, and S. Mohammadi, “Low-cost fault tolerance in evolvable multiprocessor system: a graceful degradation approach,” Submitted to Journal of Zhejiang University SCIENCE A (JZUS-A).  Conference: 1.S. Vakili, S. M. Fakhraie, and S. Mohammadi, “Designing an MPSoC architecture with run-time and evolvable task decomposition and scheduling,” Proc. 5’th IEEE Intl. Conf. Innovations in Information Technology, Dec S. Vakili, S. M. Fakhraie, S. Mohammadi, and Ali Ahmadi, “Particle swarm optimization for run-time task decomposition and scheduling in evolvable MPSoC,” Proc. IEEE. Intl. conf. Computer Engineering and Technology, Jan