Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab.

Similar presentations


Presentation on theme: "Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab."— Presentation transcript:

1 Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab Supervisor: Evgeny Fiksman Performed by: Moshe Bino Alex Tikh Spring 2007 1’st Semester Presentation 1

2 Table of Content Introduction Hardware Design Software Design Debug Process Time Table 2Table of Content

3 Introduction Hardware Design Software Design Debug Process Time Table 3Table of Content

4 Problem statement Single CPU is reaching its technological limits, e.g. heat dissipation and cost/power ratio. Thus parallel computing evolved, utilizing multi core processor paradigm. Three major inter-communication techniques are: Message passing, Shared memory and Remote procedure calls. 4Introduction

5 Project description Multi core system of four MicroBlaze processors is to be built on Xilinx FPGA. Message passing model is chosen for processor inter- communication. Implemented as MPI library specification. Network-on-Chip (NoC) methodology employed for cores interconnect. Dedicated NoC router is implemented. 5Introduction

6 Project description 6Introduction

7 Project description The project is a basic SoPC platform for programmable chips. The system can be combined to a multi-core processor, which efficiently handles designated tasks or as a group of hardware accelerators which support the main processor unit. The system can be expanded into a larger network depending on the device resources. The system provides relatively high and flexible computation power on a small device, board etc. 7Introduction

8 The following components are to be implemented: Quad core system. NoC router (4 ports) and infrastructure for fast communication in multi-core system. Chosen MPI functions written in C. Software application demonstrating the advantages of a parallel system (written in C). Project goals 8Introduction

9 Constrains: FPGA (V2P) maximum clock frequency 400MHz. MicroBlaze core maximum frequency 100MHz. Processors Memory size 64kbyte. (code + data). Processor to FSL access time - 3 clock cycles. Maximum FSL buffer depth is 128 - equals 0.5kbyte. Interrupt handle time - 20 clock cycles (no interrupts nesting). Preferences: Router works at maximum frequency. Router is designed for relatively small messages – maximum 1kbyte due to processors memory size. System specifications 9Introduction

10 10 MPI - Message Passing Interface MPI is a library specification (language independent) for message-passing, proposed as a standard by a broadly based committee of vendors, implementers, and users. Designed for high performance on both massively parallel machines and on workstation clusters. MPI is widely available, with both free available and vendor-supplied implementations. Introduction

11 11 The upper word is the Header. The lower word is the Tail. Data is located in the middle. Each word is 32 bit. Message structure Introduction

12 The Header consist of the fields: Message payload 12 Name Size (bits) OrderDescription H10Represent the Header DST41:4The message destination in the COMM COMM45:8The group of cores in the message destination CMD49:12The command name for this message (Send, Bcast) TYPE413:16The date type in this message DATA CNT1017:26The number of words in this message Name Size (bits) OrderDescription T10Represent the Tail SRC41:4 The message source port in it ’ s SCOMM SCOMM45:8Group of cores in the message source port TAG119:19Message code, group of messages in the same topic\issue * Empty fields where left to allow network and functionality extensions. The Tail consist of the fields: Introduction

13 Block diagram 13Introduction

14 Table of Content Introduction Hardware Design Software Design Debug Process Time Table 14Table of Content

15 Router Implementation 15Hardware Design

16 Router specification The router consists of one major block called Cross Bar. The Cross Bar is a network switch configured for switching data across multiple ports. it utilizes an efficient arbiter based on Round Robin mechanism. The Cross Bar supports port to port message passing. and broadcasting (not simultaneously). The Cross Bar comprise of 2 main units: 1.Permission unit. 2.Port FSM (for each port). 16Hardware Design

17 CROSS – BAR 17Hardware Design

18 Permission process 18Hardware Design Round Robin arbiter- service order according to loop. Check if Dest’ is not busy. Permit for a ‘time slot’. If not requesting, service next requesting port. BUSY and LAST writing ports are saved.

19 Timer Unit Timing generator - enables each port for constant ‘time slot’. When ‘Permit’ input is de-asserted the present time slot is switched to the next requesting port. If all ports request permission, priority privilege is by order. select relevant Req signal to Controller. 19Hardware Design

20 Controller 20Hardware Design Checks if enabled port request permission. Checks for busy ports with last writing port. Permit last source port until message delivery ends. Updates busy and last writing port signals.

21 Port FSM 21Hardware Design Destination is extracted from Header. Request is asserted high. Permission is checked before any state transition. When granted, message is delivered to destination until tail is found. In BCAST, each read word is sent to each port destination in a loop. ports written are saved. request is de-asserted at end.

22 22 Control Path Arbiter Connects Dest & Permit signals to/from the control Bus according to PORT address. Tri-state Buffers - unused Dest signals are fed with high Z. Unused Permit signals (Port FSM direction) are fed with ‘0’. Hardware Design

23 Connects the appropriate controls and data to the Buses according to PORT address. Connects the buses to the appropriate fsl according to DEST address. Generally - buses allows increasing ports number by adding Bus Interfaces with the sequential port address. 23Hardware Design Data Path Arbiter

24 Example 1 At each time slot part of the message is send to it’s destination as long as the destination port is not busy. When Port is busy the next requesting port is service (no delay). 24Hardware Design

25 Example 2 If one port has no data (port 2) other ports are serviced by order. 25Hardware Design

26 Example 3 Handling BCAST command and port arbitrating while 2 ports has the same destination. 26Hardware Design

27 The fifo control bit is “bubbled” in the fifo, representing the message Header and Tail. In the MicroBlaze (MB) direction, This bit indicates the MB about message pending in the fsl pipe. (Interrupt) In the router direction, This bit indicates the router about start/end of message. 27Hardware Design Interrupt Handler

28 Messages data and FSL control bit are bubbled along the FSL channel. 28Hardware Design FSL – data & control

29 Table of Content Introduction Hardware Design Software Design Debug Process Time Table 29Table of Content

30 Software Layers Application Layer: MPI functions interface Network Layer: hardware independent implementation of these functions Data layer: relies on command bit fields Physical layer: designed for FSL bus 30Software Design

31 MPI Functions set Every MPI function returns an error value. Some of the implemented functions are trivial, and present because required by MPI standard. MPI_Init( int *argc, char ***argv ); MPI_Comm_rank ( MPI_Comm comm, int *rank ); MPI_Comm_size ( MPI_Comm comm, int *size ); MPI_Finalize(); 31Software Design

32 MPI Functions set Non-trivial functions, used for inter-processors communication are: Send, Interrupt Vector and Recv. Bcast is a combination of Send and Recv, and differs only at low design level. MPI_Send( void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm ); MPI_Bcast ( void *buf, int count, MPI_Datatype datatype, int root, MPI_Comm comm ); MPI_Recv( void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status ); 32Software Design

33 33 MPI Functions set Three additional complimentary functions. Supply additional info about the received message. MPI_Get_source( MPI_Status* status, MPI_Datatype datatype, int *source ); MPI_Get_count( MPI_Status* status, MPI_Datatype datatype, int *count ); MPI_Get_tag( MPI_Status* status, MPI_Datatype datatype, int *tag ); Software Design

34 MPI_Send: composes header and tail, and sends it with the message (body) Sending the message 34Software Design

35 Receiving the message Interrupt Vector: receives incoming messages, and stores them in suitable linked list 35Software Design

36 36 Return received message MPI_Recv: message details received from user. Looks for this message in linked list of already received messages Software Design

37 Example application Matrix - Vector multiplication Typical example of highly parallel application. VectorRoot processor broadcasts Vector. Matrix RowSelected Matrix Row sent by root to each processor. Each processor computes and returns its result. Computed results are combined into a vector by root processor. 37Software Design

38 Example application Matrix - Vector multiplication 38Software Design Root

39 Table of Content Introduction Hardware Design Software Design Debug Process Time Table 39Table of Content

40 40 Debug - structure Debug Process

41 The Test Bench reads messages from a file and write them into the FSL pipe (MB output side). It also reads messages from the pipe (MB input side). Signals can also be viewed in ModelSim Debug – Test Bench 41Debug Process

42 Table of Content Introduction Hardware Design Software Design Debug Process Time Table 42Table of Content

43 Semester 2 - Tasks 43 Build a quad core systemDone Implement router for the system build a modular router in VHDL 01-03-08 Test and debug Router (hardware) MPI API (software) 01-04-08 Run a test application measure speed-up as function of average message size and messages amount 25-04-08 Time Table

44 QUESTIONS ?


Download ppt "Fast Communication for Multi – Core SOPC Technion – Israel Institute of Technology Department of Electrical Engineering High Speed Digital Systems Lab."

Similar presentations


Ads by Google