Presentation is loading. Please wait.

Presentation is loading. Please wait.

by Manuel Saldaña, Daniel Nunes, Emanuel Ramalho, and Paul Chow

Similar presentations


Presentation on theme: "by Manuel Saldaña, Daniel Nunes, Emanuel Ramalho, and Paul Chow"— Presentation transcript:

1 Configuration and Programming of Heterogeneous Multiprocessors on a Multi-FPGA System Using TMD-MPI
by Manuel Saldaña, Daniel Nunes, Emanuel Ramalho, and Paul Chow University of Toronto Department of Electrical and Computer Engineering 3rd International Conference on ReConFigurable Computing and FPGAs (ReConFig06) San Luis Potosi, Mexico September, 2006

2 Agenda Motivation Background New Developments Example Application
TMD-MPI Classes of HPC Design Flow New Developments Example Application Heterogeneity test Scalability test Conclusions 11/14/2018 Manuel Saldaña

3 Motivation How Do We Program This? 64-MicroBlaze MPSoC
(Ring,2D-Mesh) topologies XC4VLX Not the largest one! 11/14/2018 Manuel Saldaña

4 Motivation How Do We Program This? Network
512-MicroBlaze Multiprocessor System 11/14/2018 Manuel Saldaña

5 Background: Classes of HPC Machines
Class 1 Machines Supercomputers or clusters of workstations Interconnection Network 06/09/2006 Connections 2006

6 Background: Classes of HPC Machines
Class 1 Machines Supercomputers or clusters of workstations Interconnection Network Class 2 Machines Hybrid network of CPU and FPGA hardware FPGA acts as external co-processor to CPU Interconnection Network 06/09/2006 Connections 2006

7 Background: Classes of HPC Machines
Class 1 Machines Supercomputers or clusters of workstations Interconnection Network Class 2 Machines Hybrid network of CPU and FPGA hardware FPGA acts as external co-processor to CPU Interconnection Network Class 3 Machines FPGA-based multiprocessor Recent area of academic and industrial focus Interconnection Network 06/09/2006 Connections 2006

8 Background: MPSoC and MPI
MPSoC (Class 3) has many similarities to typical multiprocessor computers (Class 1), but also many special requirements Similar concepts but different implementations MPI for MPSoC is desirable (TIMA labs, OpenFPGA, Berkeley BEE2, U. of Queensland, U. Rey Juan Carlos, UofT TMD,...) MPI is a broad standard and designed for big machines MPI Implementations are too big for embedded systems 11/14/2018 Manuel Saldaña

9 Background: TMD-MPI MPSoC (TMD-MPI) Linux Cluster (MPICH)
Network mP Linux Cluster (MPICH) the same code… 11/14/2018 Manuel Saldaña

10 Background: TMD-MPI Use multiple chips to have massive resources
Network mP Network Use multiple chips to have massive resources Network mP Network mP Network TMD-MPI hides the complexity 11/14/2018 Manuel Saldaña

11 Background: TMD-MPI Implementation Layers TMD-MPI MPI_Barrier
MPI_Send/MPI_Recv csend/send fsl_cput / fsl_put (macros) put/get (assembly instructions) Application MPI Application Interface Point-to-Point MPI TMD-MPI Communication Functions Hardware Access Functions Hardware 11/14/2018 Manuel Saldaña

12 Background: TMD-MPI MPI Functions Implemented Point-to-Point MPI_Send
MPI_Recv MPI Functions Implemented Miscellaneous MPI_Init MPI_Finalize MPI_Comm_Rank MPI_Comm_Size MPI_Wtime Collective Operations MPI_Barrier MPI_Bcast MPI_Gather MPI_Reduce 11/14/2018 Manuel Saldaña

13 Background: Design Flow
Flexible Hardware-Software Co-design Flow Previous work: Patel et al.[1] (FCCM 2006) Saldaña et al.[2] (FPL 2006) ReConFig06 11/14/2018 Manuel Saldaña

14 New Developments TMD-MPI for MicroBlaze TMD-MPI for PowerPC405
TMD-MPE for Hardware engines 11/14/2018 Manuel Saldaña

15 New Developments: TMD-MPE and TMD-MPI light
Hardware Engine With Message-Passing 11/14/2018 Manuel Saldaña

16 New Developments TMD-MPE uses the Rendezvous message-passing protocol
11/14/2018 Manuel Saldaña

17 New Developments TMD-MPE includes:
message queues to keep track of unexpected messages packetizing/depacketizing logic to handle large messages top queue 11/14/2018 Manuel Saldaña

18 temperature distribution
Heterogeneity Test Heat Equation Application / Jacobi Iterations Observe the change of temperature distribution over time TMD-MPI TMD-MPE TMD-MPI 11/14/2018 Manuel Saldaña

19 Heterogeneity Test Heat Equation Application / Jacobi Iterations
TMD-MPI TMD-MPE TMD-MPI 11/14/2018 Manuel Saldaña

20 Heterogeneity Test MPSoC Heterogeneous Configurations
(9 Processing Elements, single FPGA) 11/14/2018 Manuel Saldaña

21 Heterogeneity Test Execution Time PPC405 Jacobi Engines MicroBlazes
11/14/2018 Manuel Saldaña

22 Scalability Test Heat Equation Application
5 FPGAS (XC2VP100) (7 mB + 2 PPC405 per FPGA) 45 Processing Elements (35 mB + 10 PPC405) 11/14/2018 Manuel Saldaña

23 Scalability Test Fixed-size Speedup up to 45 Processors 11/14/2018
Manuel Saldaña

24 UofT TMD Prototype 11/14/2018 Manuel Saldaña

25 Conclusions TMD-MPI and TMD-MPE enable the parallel programming of heterogeneous MPSoC across multiple FPGAs including hardware engines TMD-MPI hides the complexity of using heterogeneous links The Heat equation application code was executed in a Linux Cluster and in our multi-FPGA system with minimal changes TMD-MPI can be adapted to a particular architecture TMD prototype is a good platform for further research on MPSoC 11/14/2018 Manuel Saldaña

26 References [1] Arun Patel, Christopher Madill, Manuel Saldaña, Christopher Comis, Régis Pomès, and Paul Chow. A Scalable FPGA-based Multiprocessor. In IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06), April 2006 [2] Manuel Saldaña and Paul Chow. TMD-MPI: An MPI Implementation for Multiple Processors across Multiple FPGAs. In IEEE International Conference on Field-Programmable Logic and Applications (FPL 2006), August 2006. 11/14/2018 Manuel Saldaña

27 Thank you! (¡Gracias!) 11/14/2018 Manuel Saldaña

28 Rendezvous Overhead Rendezvous Synchronization Overhead 11/14/2018
Manuel Saldaña

29 Testing the Functionality
TMD-MPIbench on-chip communication Internal RAM (BRAM) off-chip communication round-trip tests on-chip communication External RAM (DDR) off-chip communication 11/14/2018 Manuel Saldaña

30 TMD-MPI Implementation
TMD-MPI communication protocols 11/14/2018 Manuel Saldaña

31 Communication Tests TMD-MPIbench.c round trip bisection bandwidth
round trips with congestion worst case traffic scenario all-node broadcasts synchronization performance (barriers/sec) 11/14/2018 Manuel Saldaña

32 Communication Tests Latency: Testbed (internal link) 17 mS @ 40 MHz
Testbed (external link) 22 mS P3-NOW 100 Mb/s Ethernet 75 mS P4-Cluster 1000 Mb/s Gigabit Ethernet 92 mS 11/14/2018 Manuel Saldaña

33 Communication Tests MicroBlaze throughput limit with external RAM
11/14/2018 Manuel Saldaña

34 Communication Tests MicroBlaze throughput limit with internal RAM
Memory access time MicroBlaze throughput limit with external RAM 11/14/2018 Manuel Saldaña

35 Communication Tests Measured Bandwidth @ 40 MHz Startup Frequency
Overhead Frequency Measured Bandwidth @ 40 MHz P4-Cluster P3-NOW 11/14/2018 Manuel Saldaña

36 Communication Tests 11/14/2018 Manuel Saldaña

37 Many variables are involved…
11/14/2018 Manuel Saldaña

38 Background: TMD-MPI TMD-MPI provides a parallel programming model for MPSoC in FPGAs with the following features: Portability - application unaffected by changes in HW Flexibility - to move from generic to application-specific Scalability - for large scale applications Reusability - do not learn a new API for similar applications 11/14/2018 Manuel Saldaña

39 Testing the Functionality
Hardware Testbed 11/14/2018 Manuel Saldaña

40 Testing the Functionality
Hardware Testbed 11/14/2018 Manuel Saldaña

41 New Developments: TMD-MPE
TMD-MPE use and the network 11/14/2018 Manuel Saldaña

42 Background: TMD-MPI TMD-MPI
is a lightweight subset of the MPI standard is tailored to a particular application does not require an Operating System has a small memory footprint ~8.7KB uses a simple protocol 11/14/2018 Manuel Saldaña

43 New Developments: TMD-MPE and TMD-MPI light
11/14/2018 Manuel Saldaña


Download ppt "by Manuel Saldaña, Daniel Nunes, Emanuel Ramalho, and Paul Chow"

Similar presentations


Ads by Google