Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.

Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design the custom processors Step 3: program the FPGA Step 4: assign the tasks to the processors and set up the connection network ← Multi-cellular organization ← ??? ← Growth (cellular division)

Development in hardware – Why? Step 2: as a function of the tasks, design one (or more) custom processors. ×+÷≠ FFT + × DCT ×+÷≠ FFT + × IN DCT OUT

Cellular differentiation Cells adapt their physical structure to fit the “application” Can circuits/processors do the same? Physically? No Logically? Yes, but… Can they do it easily (dare we say, automatically)?

Cellular differentiation Needed: adaptable cellular architecture That is, a processor architecture that is Customizable Compact Powerful Easy to design and modify Amenable to evolution and learning Possible solution: MOVE architectures

The MOVE paradigm One single instruction : move Data displacements trigger operations Architecture based around data ≠ operation centric Regular structure : functional units + data network Scalable and modular architecture Example: Sum of two values Conventional architecture: add R1, R2, R3; MOVE architecture: move O(Fxxx), I1(Fsum) move O(Fyyy), I2(Fsum) move O(Fsum), I(Fzzz)

Cellular differentiation Main features: Conventional fetch/decode mechanism – compatible with bio-inspired mechanisms No pipeline: computation carried out in specialized functional units (FU) Communication carried out in specialized communication units (CU) Only one instruction that MOVEs data to and from the CUs and FUs (dataflow architecture)

Cellular differentiation Main advantages: Can be easily customized by introducing application- specific functional and communication units. Perfectly fits the requirements of systolic arrays (arbitrarily complex communication patterns). The introduction of custom components does not affect the assembler language, the code structure, the fetch and decode units, or the transport bus.

Genotype Layer Phenotype Layer Example – Automatic Synthesis Application-specific (parallel) functions Developmental algorithm Genetic code Mapping Layer

Example – Automatic Synthesis Phenotype Layer Mapping Layer Genotype Layer Totipotent Cell

Example – Automatic Synthesis Totipotent Cell Programmable Logic

Example – Automatic Synthesis Programmable Logic Cellular Array

Implementation - The BioWall

Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design the custom processors Step 3: program the FPGA Step 4: assign the tasks to the processors and set up the connection network ← Multi-cellular organization ← ??? ← Cell specialization ← Growth (cellular division)

Phenotype Layer Cell design and specialization Application code (parallel) Within a MOVE framework, the specialization (differentiation) of a cell corresponds to the selection of the functional and communication units that can most efficiently implement the desired application.

FU extraction Extracting the optimal FUs from the code is a complex problem!

FU extraction How about having a quick peek at biology? Idea: let us use evolution!! In fact, this approach is much closer to biology than simply evolving code: in nature, the hardware (the cell) and the software (the genome) have evolved together!

FU extraction Idea: let us use evolution!!

FU extraction First step: profiling the code (standard compilation technique)

FU extraction Second step: transform into tree (standard compilation technique) Third step: represent as 1-D genome Fourth step: run the GA (with some fancy optimizations)

Fitness evaluation s = size of the new processor t = execution time of the program on the new processor α = execution time of the program on a minimal processor β = hardware area to implement the minimal processor (which has, by definition, a fitness of 1) hwLimit = maximum hardware allowed to implement the new processor Note: Relative fitness function When out of allowed hardware range, logarithmic decrease The hardware investment has to be small enough to be retained

Determining hardware size How can the size of the new FU estimated (the β parameter of the fitness) ? The idea: Determine the size of each basic building block ( +, -AND, …) What to do with assignments or loops ? Compute how many of them are used for a new FU The characterization has to be done for every target platform.

Determining hardware execution time Use the same idea used for size : Compute the time needed for each elementary function Take targeted clock period as a basis When time estimated > clock period, add 1 to the total time  small jumps in the fitness landscape

Pattern-matching optimization How to find reusable FUs ? The GA behaves a bit like random mutations  difficult to find reusability this way Helps the GA a bit : search the whole tree each time a new HW block is defined to replace similar pieces of code

Non-optimal block pruning “Cleaning” phase made at each step Removes HW blocks that are non- optimal from the fitness point-of-view To see if a block is useful, compute the fitness with and without this block implemented in HW. If the software solution has a better fitness, the block is non-optimal and can be removed.

FU extraction - Interface STANDARDDOMAIN

FU extraction - Results Example (functions from FACT factorization algorithm): Hardware increase (estimated): 10% (fixed) Speedup (estimated): 2.27 (227%) Other results: All were obtained in a few seconds

Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.

Similar presentations

Presentation on theme: "Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.

Similar presentations

Presentation on theme: "Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design."— Presentation transcript:

Similar presentations

About project

Feedback