Presentation is loading. Please wait.

Presentation is loading. Please wait.

EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín.

Similar presentations


Presentation on theme: "EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín."— Presentation transcript:

1 EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín José Manuel Rodríguez García Jesús María Álvarez Llorente Juan Luis García Zapata Departamento de Informática Universidad de Extremadura SPAIN

2 EuroPVM/MPI 2003. Venice, September 29 – October 2 Index 2 I.Introduction and goals II.IDSP: A Distributed Framework for DSPs III.Implementing the P4 functionality upon IDSP IV.Measuring the P4 Overhead V.Conclusions VI.Current and Future Work

3 EuroPVM/MPI 2003. Venice, September 29 – October 2 Fields of application: Communications Voice and Data Compression Mobile Telephony Speech Processing Image and Video Processing Medical more... Introduction and goals DSP processors show specialized architectures to run real-time digital signal processing

4 EuroPVM/MPI 2003. Venice, September 29 – October 2 Sundance SMT310Q PCI carrier board with four TI C6201 DSPs Nets of DSP multi-computers such as those from Sundance, Motorola or Hunt Engineering. Introduction and goals 4 Target machines

5 EuroPVM/MPI 2003. Venice, September 29 – October 2 Introduction and goals 5 Target machines 150-MHz. Capable of delivering 900 MFLOPS 16 or 32 MBytes of 100 MHz SDRAM 64 Kbytes of CACHE / internal RAM 128K Bytes of flash programmable and erasable ROM No MMU for virtual memory management The Texas Instrumens C6000 family of DSPs: Very limited resources Targeted to embedded systems

6 EuroPVM/MPI 2003. Venice, September 29 – October 2 Introduction and goals 6 High Computational Complexity and Real Time requirements A distributed programming standard like MPI is needed MPI Current DSP software poses the portability problem: Platform specific Provides only low level communication libraries Poor support to build portable parallel applications Most applications can be decoupled and distributed among two or more processors

7 EuroPVM/MPI 2003. Venice, September 29 – October 2 DSP/BIOS. Texas Instruments Kernel for C6000 family of DSP processors (21 Kb) IDSP: A Distributed Framework for DSPs 7 Thread Synchronization: SEM_pend SEM_post Thread Management: TSK_create TSK_delete Timing services: CLK_gethtime Tracing and Analysis

8 EuroPVM/MPI 2003. Venice, September 29 – October 2 IDSP: A Distributed Framework for DSPs 8 IDSP. Our own development. It extends DSP/BIOS with distributed facilities (30 Kb) IDSP runs on DSK (1 x C6000) Sundance Multicomputer SMT310Q (4 x C6000) IDSP Thread P2P Communication: COMM_send COMM_recv COMM_asend COMM_arecv COMM_wait COMM_test... Thread Management: OPER_create OPER_destroy GROUP_create GROUP_destroy

9 EuroPVM/MPI 2003. Venice, September 29 – October 2 9 IDSP: A Distributed Framework for DSPs An IDSP application is a group of operators communicating by message passing oper 1 2 3 input stream 1 input stream 2 output stream oper 4 5 An operator is a thread that runs an algorithm: FFT, etc IDSP address Machine Group Operator Port

10 EuroPVM/MPI 2003. Venice, September 29 – October 2 10 IDSP: A Distributed Framework for DSPs IDSP shows a microkernel architecture: Algorithm operator P4 address mapper RPC System Servers I/O Server Group Server Operator Server GROUP_ CIO_ OPER_ System servers operators Software Bus Kernel COMM_ A message passing kernel

11 EuroPVM/MPI 2003. Venice, September 29 – October 2 11 Implementing the P4 functionality upon IDSP DSP/BIOS C6000 DSP/BIOS C6000 DSP/BIOS C6000 IDSP We have put P4 on top of IDSP: MPICH is a portable implementation of MPI: MPI P4 ADI It shows a three layers design: 1.MPI macros 2.Abstract Device Interface 3.Channel Interface, being P4 a well known example

12 EuroPVM/MPI 2003. Venice, September 29 – October 2 Implementing the P4 functionality upon IDSP 12 The P4 re-entrancy problem P4 is process based: Operating system P4 library Processes IDSP is thread based IDSP A thread safe version of P4 has been built by: Modified P4 library Threads Putting P4 global variables in IDSP threads private zone Using mutual exclusion mechanisms

13 EuroPVM/MPI 2003. Venice, September 29 – October 2 Implementing the P4 functionality upon IDSP 13 Communication network IDSP provides its own addressing scheme DSP/BIOS C6000 DSP/BIOS C6000 DSP/BIOS C6000 IDSP IDSP address P4 IP address sockets P4 is based upon TCP/IP Berkeley sockets, but We have done IDSP/Sockets, a thin and efficient implementation of Berkeley Sockets atop IDSP IDSP/

14 EuroPVM/MPI 2003. Venice, September 29 – October 2 User Operator Address Mapping Server User Operator Idsp_addrIp_addr Idsp_addrIp_addr receiver sender Implementing the P4 functionality upon IDSP 14 The IP/IDSP mapping p4_send(rank,...) Every user operator keeps a cache of addresses Register (idsp_addr, ip_addr) Idsp_addrIp_addr 1 Idsp_addr = 3 2 Get(ip_addr ) send(IP_address,...) COMM_send(IDSP_address,...) 4

15 EuroPVM/MPI 2003. Venice, September 29 – October 2 Implementing the P4 functionality upon IDSP 15 Signals DSP/BIOS does not provide signals !!! IDSP takes advantage of this principle for supporting the UNIX signal mechanism: 1.A special message is sent to the target thread 2.The target thread receive these message on next socket read DSP involved threads, however, exhibits a quite frequent interaction with the kernel for data I/O P4 uses UNIX signals for time-outs and process management, but...

16 EuroPVM/MPI 2003. Venice, September 29 – October 2 Implementing the P4 functionality upon IDSP 16 The startup process But embedded systems dont use disks !! The IDSP approach is as follows: 1.Every operator has a well known integer identifier 2.A limited number of operators is linked 3. GROUP_create takes an array of operator identifiers 4.Currently, it assigns each operator to the least loaded machine P4 uses a text file specifying program files and machines: Local0 Sun21/home/user/P4pgms/sun/prog1 Sun32/home/user/P4pgms/sun/prog2 rs60001/home/user/P4pgms/rs6000/prog1

17 EuroPVM/MPI 2003. Venice, September 29 – October 2 Measuring the P4 Overhead 17 Time to send short messages between two operators Overhead of the socket interface on IDSP send COMM_send

18 EuroPVM/MPI 2003. Venice, September 29 – October 2 Measuring the P4 Overhead 18 P4_send COMM_send Time to send short messages between two operators Overhead of P4 interface on IDSP

19 EuroPVM/MPI 2003. Venice, September 29 – October 2 Conclusions 19 IDSP, a message passing interface for DSPs, has been defined and implemented The IDSP performance in the TI C6000 DSP architecture is currently reasonably good (50µs for short messages) We have been able of supporting P4 upon the small IDSP interface P4 performance upon IDSP is good, but not good enough for high performance distributed digital signal processing A more tuned channel interface layer is needed for DSPs

20 EuroPVM/MPI 2003. Venice, September 29 – October 2 Current and Future Work 20 IDSP is currently been augmented with MPI-like p2p primitives such as COMM_waitany, etc. A DSP specific channel interface layer will be developed. The ADI and MPI will be supported by such layer. The 64 bits C6400 family will be faced soon.

21 EuroPVM/MPI 2003. Venice, September 29 – October 2 21 Thank you very much !

22 EuroPVM/MPI 2003. Venice, September 29 – October 2 22 Thank you very much !

23 EuroPVM/MPI 2003. Venice, September 29 – October 2 Implementing the P4 functionality upon IDSP 23 Groups MPI implement the concept of group IDSP have a different concept of group ¿How is this managed? Groups and processes in a MPI application runs in the context of an IDSP group

24 EuroPVM/MPI 2003. Venice, September 29 – October 2 Implementing the P4 functionality upon IDSP 24 Listener process P4 uses an auxiliary process for doing background work IDSP have not an auxiliary thread ¿How do IDSP does this work? Doing this background work Sending initial information for threads to run (threads have not parameters at startup) We use an asynchronous communicator for Additional Port Communication Port Operator SEND RECEIVE CONNECTION_REQ DIE INITIAL_INFO

25 EuroPVM/MPI 2003. Venice, September 29 – October 2 25 - Un thread IDSP corre un algoritmo en un sentido diferente que un proceso MPI/P4, que corren todos el mismo programa -

26 EuroPVM/MPI 2003. Venice, September 29 – October 2 26 User Operator Address Mapping Server User Operator Idsp_addrIp_addr Idsp_addrIp_addr receiver sender Implementing the P4 functionality upon IDSP 26 The IP/IDSP mapping P4 maps process ranks into IP addresses Every user operator keeps a cache of addresses Register (idsp_addr, ip_addr) Idsp_addrIp_addr 1 Idsp_addr = 3 2 Get(ip_addr ) IDSP/Sockets maps IP addresses into IDSP addresses:


Download ppt "EuroPVM/MPI 2003. Venice, September 29 – October 2 Porting P4 to Digital Signal Processing Platforms Juan Antonio Rico Gallego Juan Carlos Díaz Martín."

Similar presentations


Ads by Google