Inter-Processor Communication for Heterogeneous Dual Core Systems Chun-Ming Huang, Ph.D. National Chip Implementation Center (CIC) 2006/09/27.

Inter-Processor Communication for Heterogeneous Dual Core Systems Chun-Ming Huang, Ph.D. National Chip Implementation Center (CIC) cmhuang@cic.org.tw 2006/09/27

C. M. Huang / SLDC-IPC / 09.20062 Agenda  IPC Overview  IPC Schemes  Nokia DSP Gateway  TI DSP/BIOS Link  IPC Hardware Architecture  Conclusions

IPC Overview

C. M. Huang / SLDC-IPC / 09.20064 What is IPC?  Inter-Process Communication  Inter-Processor Communication Single-ChipMulti-Chip Single-Core Multi-Core How to provide inter-process communication services for multi-core systems?

C. M. Huang / SLDC-IPC / 09.20065 Independent & Cooperating Process  Processes executing concurrently in the multitasking environment may be either independent processes or cooperating processes  A process is independent if it cannot affect or be affected by the other processes executing in the system; any process that does not share data with any other process is independent  A process is cooperating if it can affect or be affected by the other processes executing in the system; any process that shares data with other processes is a cooperating process Silberschatz, et al., Operating System Principles, Seventh Edition

C. M. Huang / SLDC-IPC / 09.20066 Why Allow Process Cooperation?  Information sharing  Computation speedup  Modularity  Convenience  Cooperating processes requires an inter-process communication (IPC) mechanism that will allow them to exchange data and information Silberschatz, et al., Operating System Principles, Seventh Edition

C. M. Huang / SLDC-IPC / 09.20067 IPC Example  Unix pipe  ls –l / | grep 2005 | wc  21998  The grep utility searches text files for a pattern and prints all lines that contain that pattern.  The wc utility displays a count of lines, words and characters in a text file.  Data exchange  Synchronization

C. M. Huang / SLDC-IPC / 09.20068 Operating System Kernel Components  Process scheduler –determines when and for how long a process execute on a processor  Memory manager –determines when and how memory is allocated to processes and what to do when memory becomes full  I/O manager –services input and output requests from and to hardware devices  Inter-process communication (IPC) manager –allows processes to communicate with one other  File system manager –organizes named collections of data on storage devices and provides an interface for accessing data on those devices Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.20069 Linux Kernel 2.6.17.11 drwxr-xr-x arch drwxr-xr-x block drwxr-xr-x crypto drwxr-xr-x drivers drwxr-xr-x fs drwxr-xr-x include drwxr-xr-x init drwxr-xr-x ipc drwxr-xr-x kernel drwxr-xr-x lib drwxr-xr-x mm drwxr-xr-x net drwxr-xr-x scripts drwxr-xr-x security drwxr-xr-x sound drwxr-xr-x usr -rw-r--r-- Makefile -rw-r--r-- compat.c -rw-r--r-- compat_mq.c -rw-r--r-- mqueue.c -rw-r--r-- msg.c -rw-r--r-- msgutil.c -rw-r--r-- sem.c -rw-r--r-- shm.c -rw-r--r-- util.c -rw-r--r-- util.h http://www.kernel.org

C. M. Huang / SLDC-IPC / 09.200610 Machine-Independent SW in the FreeBSD Kernel CategoryLines of CodePercentage of Kernel (%) Headers38,1584.8 initialization1,6630.2 kernel facilities53,8056.7 generic interfaces22,1912.8 interprocess communication10,0191.3 terminal handling5,7980.7 virtual memory24,7143.1 vnode memory22,7642.9 local filesystem28,0673.5 miscellaneous filesystems (19)58,7537.4 network filesystem22,4362.8 network communication46,5705.8 Internet V4 protocols41,2205.2 Internet V6 protocols45,5275.7 IPsec17,9562.2 netgraph74,3389.3 cryptographic support7,5150.9 GEOM layer11,5631.4 CAM layer41,8055.2 ATA layer14,1921.8 ISA bus10,9841.4 PCI bus72,3669.1 pccard bus6,9160.9 Linux compatibility10,4741.3 Total Machine Independent689,79486.4 McKusic & Neville-Neil, The Design and Implementation of the FreeBSD Operating System

C. M. Huang / SLDC-IPC / 09.200611 Homogeneous vs. Heterogeneous TI OMAP 5910 Sun

C. M. Huang / SLDC-IPC / 09.200612 Multiprocessor OS Organizations  Can classify systems based on how processors share operating system responsibilities  Three types –Master/slave –Separate kernels –Symmetrical organization Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200613 Master/Slave  Master/Slave organization –Master processor executes the operating system –Slaves execute only user processors –Hardware asymmetry –Low fault tolerance –Good for computationally intensive jobs –Example: nCUBE system Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200614 Separate Kernels  Separate kernels organization –Each processor executes its own operating system –Some globally shared operating system data –Loosely coupled –Catastrophic failure unlikely, but failure of one processor results in termination of processes on that processor –Little contention over resources –Example: Tandem system Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200615 Symmetrical Organization  Symmetrical organization –Operating system manages a pool of identical processors –High amount of resource sharing –Need for mutual exclusion –Highest degree of fault tolerance of any organization –Some contention for resources –Example: BBN Butterfly Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200616 Memory Access Architectures  Memory access –Can classify multiprocessors based on how processors share memory –Goal: Fast memory access from all processors to all memory Contention in large systems makes this impractical Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200617 Uniform Memory Access  Uniform memory access (UMA) multiprocessor –All processors share all memory –Access to any memory page is nearly the same for all processors and all memory modules (disregarding cache hits) –Typically uses shared bus or crossbar-switch matrix –Also called symmetric multiprocessing (SMP) –Small multiprocessors (typically two to eight processors) Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200618 Uniform Memory Access Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200619 Non-Uniform Memory Access  Non-uniform memory access (NUMA) multiprocessor –Each node contains a few processors and a portion of system memory, which is local to that node –Access to local memory faster than access to global memory (rest of memory) –More scalable than UMA (fewer bus collisions) Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200620 Non-Uniform Memory Access Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200621 Cache-Only Memory Architecture  Cache-only memory architecture (COMA) multiprocessor –Physically interconnected as a NUMA is Local memory vs. global memory –Main memory is viewed as a cache and called an attraction memory (AM) Allows system to migrate data to node that most often accesses it at granularity of a memory line (more efficient than a memory page) Reduces the number of cache misses serviced remotely Overhead –Duplicated data items –Complex protocol to ensure all updates are received at all processors Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200622 Cache-Only Memory Architecture Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200623 No Remote Memory Access  No-remote-memory-access (NORMA) multiprocessor –Does not share physical memory –Some implement the illusion of shared physical memory — shared virtual memory (SVM) –Loosely coupled –Communication through explicit messages –Distributed systems –Not networked system Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200624 No Remote Memory Access Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200625 Four Possible Cases Symmetrical OSs Asymmetrical OSs Homogeneous Cores CPU_A(OS_X) CPU_A(OS_Y) Heterogeneous Cores CPU_A(OS_X) CPU_B(OS_X) CPU_A(OS_X) CPU_B(OS_Y)

IPC Schemes

C. M. Huang / SLDC-IPC / 09.200627 Communication via Files  Communication via files is in fact the oldest way of exchanging data between programs. Program A writes data to a file and Program B reads it. In a system in which only one program can be run at any given time, this does not present any problem.  In a multitasking system, however both programs could be run as processes at least quasi-parallel to each other. Race conditions then usually produce inconsistencies in the file data which result from one program reading a data area before the other has finished modifying it, or both processes modifying the same area of memory at the same time.

C. M. Huang / SLDC-IPC / 09.200628 Communication via Files  Locking entire files –lock file –fcntl( ) (POSIX), flock( ) (BSD 4.3)  Locking file areas (record locking) –Deadlock

C. M. Huang / SLDC-IPC / 09.200629 Process Communication Models  Message passing  Shared memory Silberschatz, et al., Operating System Principles, Seventh Edition

C. M. Huang / SLDC-IPC / 09.200630 IPC for Linux  Linux IPC –Many IPC mechanisms derived from traditional UNIX IPC Allow processes to exchange information –Some are better suited for particular applications For example, those that communicate over a network or exchange short messages with other local applications Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200631 IPC for Linux  Signal  Pipe  Message queue  Shared memory  System V Semaphores  Sockets

C. M. Huang / SLDC-IPC / 09.200632 Signals  Signals –One of the first interprocess communication mechanisms available in UNIX systems –Kernel uses them to notify processes when certain events occur –Do not allow processes to specify more than a word of data to exchange with other processes –Created by the kernel in response to interrupts and exceptions, are sent to a process or thread as a result of executing an instruction (such as a segmentation fault) from another process (such as when one process terminates another) from an asynchronous event Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200633 POSIX Signals Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200634 Signals  A process/thread can handle a signal by 1.Ignore the signal — processes can ignore all but the SIGSTOP and SIGKILL signals. 2.Catch the signal — when a process catches a signal, it invokes its signal handler to respond to the signal. 3.Execute the default action that the kernel defines for that signal  Default actions –Abort: terminate immediately –Memory dump: Copies execution context before exiting –Ignore –Stop (i.e., suspend) –Continue (i.e., resume) Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200635 Signals  Signal blocking –A process or thread can block a signal Signal is not delivered until process/thread stops blocking it –While a signal handler is running, signals of that type are blocked by default Still possible to receive signals of a different type –Common signals are not queued Real-time signals provide signal queuing Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200636 Pipes  Pipes  –Producer process writes data to the pipe, after which the consumer process reads data from the pipe in first-in-first-out order –When pipe is created, an inode that points to pipe buffer (page of data) is created –Access to pipes is controlled by file descriptors Can be passed between related processes (e.g., parent and child) –Named pipes (FIFOs) ↔ Can be accessed via the directory tree –Limitation: Fixed-size buffer Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200637 Message Queues  Message queues –Allow processes to transmit information that is composed of a message type and a variable-length data area Stored in message queues, remain until a process is ready to receive them Related processes can search for a message queue identifier in a global array of message queue descriptors –Message queue descriptor contains »Queue of pending messages »Queue of processes waiting for messages »Queue of processes waiting to send messages »Data describing the size and contents of the message queue Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200638 Shared Memory  Shared memory [protection schemes] –Advantages Improves performance for processes that frequently access shared data Processes can share as much data as they can address –Standard interfaces System V shared memory POSIX shared memory –Does not allow processes to change privileges for a segment of shared memory Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200639 System V Shared Memory System Calls Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200640 Shared Memory  Shared memory implementation –Treats region of shared memory as a file –Shared memory page frames are freed when file is deleted –Tmpfs (temporary file system) stores such files Tmpfs pages are swappable Permissions can be set File system does not require formatting Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200641 System V Semaphores  System V semaphores –Designed for user processes to access via the system call interface  Semaphore arrays –Protect a group of related resources –Before a process can access resources protected by a semaphore array, the kernel requires that there be sufficient available resources to satisfy the process ’ s request –Otherwise, kernel blocks requesting process until resources become available  Preventing deadlock –When a process exits, the kernel reverses all the semaphore operations it performed to allocate its resources Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200642 Sockets  Sockets –Allows pairs of processes to exchange data by establishing direct bidirectional communication channels –Primarily used for bidirectional communication between multiple processes on different systems, but can be used for processes on the same system –Stored internally as files –File name used as socket ’ s address, accessed via the VFS Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200643 Sockets  Stream sockets –Implement the traditional client/server model –Data is transferred as a stream of bytes –Use TCP to communicate, so they are more appropriate for reliable communication  Datagram sockets –Faster, but less reliable communication –Data is transferred using datagram packets  Socketpairs –Pair of connected, unnamed sockets –Limited to use by processes that share file descriptors Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200644 sf01a:cmhuang[/] ipcs IPC status from as of Thu Sep 21 14:35:30 CST 2006 T ID KEY MODE OWNER GROUP Message Queues: Shared Memory: m 1 0x50000d1d --rw-r--r-- root root m 2 0xabbaca01 --rw-rw-rw- pc62 TR m 3103 0 --rw-rw-rw- cmhuang DSD m 1404 0 --rw-rw-rw- root root Semaphores: s 0 0x1 --ra-ra-ra- root root s 2031617 0 --ra-ra-ra- cmhuang DSD s 917506 0 --ra-ra-ra- cmhuang DSD

C. M. Huang / SLDC-IPC / 09.200645 IPC for WinXP  Data oriented –Pipes –Mailslots (message queues) –Shared memory  Procedure oriented / object oriented –Remote procedure calls –Microsoft COM objects –Clipboard –GUI drag-and-drop capability Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200646 Pipes  Manipulated with file system calls –Read –Write –Open  Pipe server –Process that creates pipe  Pipe clients –Processes that connect to pipe  Modes –Read: pipe server receives data from pipe clients –Write: pipe server sends data to pipe clients –Duplex: pipe server sends and receives data Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200647 Pipes  Anonymous Pipes –Unidirectional –Between local processes –Synchronous –Pipe handles, usually passed through inheritance  Named Pipes –Unidirectional or bidirectional –Between local or remote processes –Synchronous or asynchronous –Opened by name –Byte stream vs. message stream –Default mode vs. write-through mode Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200648 Mailslots  Mailslot server: creates mailslot  Mailslot clients: send messages to mailslot  Communication –Unidirectional –No acknowledgement of receipt –Local or remote communication –Implemented as files –Two modes Datagram: for small messages Server Message Block (SMB): for large messages Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200649 Shared Memory  File mapping –Processes map their virtual memory to same page frames in physical memory –Multiple processes access same file –No synchronization guaranteed  File mapping object –Maps file to main memory  File view –Maps a process ’ s virtual memory to main memory mapped by file mapping object Deitel, et al., Operating Systems, Third Edition

Nokia DSP Gateway

C. M. Huang / SLDC-IPC / 09.200651 Nokia DSP Gateway Overview  Supports TI OMAP1510, 1610, 5910, 5912, 2410, and 2412.  GPP side –Linux kernel 2.6.6  –Linux device driver –Access DSP through normal system calls such as read() and write()  DSP side –TI DSP/BIOS –DSP kernel library (tokliBIOS) and API http://dspgateway.sourceforge.net/pub/index.php

C. M. Huang / SLDC-IPC / 09.200652 Nokia DSP Gateway Overview  Current version: 3.3.1 (2006-09-13)  Open source software  Current license state: ReleaseLicense 1.0GPL 2.XGPL 3.X ARM packDSP pack GPLBSD

C. M. Huang / SLDC-IPC / 09.200653 TI OMAP 1610

C. M. Huang / SLDC-IPC / 09.200654 Summary of changes from v2.6.5 to v2.6.6 ============================================ [ARM PATCH] 1777/1: Add TI OMAP support to ARM core files Patch from Tony Lindgren This patch updates the ARM Linux core files to add support for Texas Instruments OMAP-1510, 1610, and 730 processors. OMAP is an embedded ARM processor with integrated DSP. OMAP-1610 has hardware support for USB OTG, which might be of interest to Linux developers. OMAP-1610 could be easily be used as development platform to add USB OTG support to Linux. This patch is an updated version of an earlier patch 1767/1 with the dummy Kconfig added for OMAP as suggested by Russell King here: http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=1767/1 This patch is brought to you by various linux-omap developers. http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.6

C. M. Huang / SLDC-IPC / 09.200655 TI DSP/BIOS  Scalable real-time kernel  Real-time scheduling and synchronization  Host-to-target communication  Real-time instrumentation  Preemptive multi-threading  Hardware abstraction  Real-time analysis and configuration tools  Application programs use DSP/BIOS by making calls to the API  All DSP/BIOS modules provide C-callable interfaces

C. M. Huang / SLDC-IPC / 09.200656 DSP Gateway System Architecture

C. M. Huang / SLDC-IPC / 09.200657 Mailbox in OMAP1  Each set of mailbox registers consists of two 16-bit registers and a 1- bit flag register.  The interrupting processor can use one 16-bit register to pass a data word to the interrupted processor and the other 16-bit register to pass a command word.

C. M. Huang / SLDC-IPC / 09.200658 Mailbox in OMAP2  6 sets of mailbox registers, and each message register can carry a 32- bit data  two mailbox queues are reserved, MAILBOX_0 for ARM to DSP direction and MAILBOX_1 for DSP to ARM direction

C. M. Huang / SLDC-IPC / 09.200659 Mailbox Command and Data Register  Command register bit definitions  Data register bit definitions

C. M. Huang / SLDC-IPC / 09.200660 Mailbox Command Definition

C. M. Huang / SLDC-IPC / 09.200661 Mailbox Command Sequence  Configuration sequence –System configuration –Task configuration –Task add/delete  Data transfer sequence –ARM to DSP transfer –DSP to ARM transfer –Task control –Read/write DSP register –Read/write DSP system parameters

C. M. Huang / SLDC-IPC / 09.200662 System Configuration Sequence

C. M. Huang / SLDC-IPC / 09.200663 DSPCFG Command

C. M. Huang / SLDC-IPC / 09.200664 ARM to DSP Passive Word Receiving

C. M. Huang / SLDC-IPC / 09.200665 ARM to DSP Active Word Receiving

C. M. Huang / SLDC-IPC / 09.200666 ARM to DSP Passive Block Receiving

C. M. Huang / SLDC-IPC / 09.200667 IPC Buffer  It is unrealistic to transfer a large amount of data between two processors with only mailbox registers. Therefore, IPBUF (Inter-Processor Buffer) is introduced for the large block data transfer.  There are three types of IPBUFs: –Global IPBUF –Private IPBUF –System IPBUF

C. M. Huang / SLDC-IPC / 09.200668 Global IPBUF  The Global IPBUFs are defined for the block data transfer between ARM and DSP.  The Global IPBUF lines are identified with BID (Buffer ID), and all tasks can use them commonly.  The maximum line size is 64k words (128k bytes).

C. M. Huang / SLDC-IPC / 09.200669 Global IPBUF

C. M. Huang / SLDC-IPC / 09.200670 DSP Gateway Linux Device Interfaces

C. M. Huang / SLDC-IPC / 09.200671 DSP Gateway Linux APIs

C. M. Huang / SLDC-IPC / 09.200672 Passive Receiving Task

C. M. Huang / SLDC-IPC / 09.200673 Active Receiving Task

TI DSP/BIOS Link

C. M. Huang / SLDC-IPC / 09.200675 TI DSP/BIOS Link  For TI OMAP5910/5912, Davinci, and DM642 devices.  DSP/BIOS Link is a no-charge, royalty-free product and is provided in C source code form.  Current version: 1.30.06 (Nov. 22, 2005)  Portable across different operating systems.  OS (GPP) + DSP/BIOS (DSP) http://focus.ti.com/dsp/docs/dspsupportatn.tsp?sectionId=3&tabId=477&familyId=44&toolTypeId=5

C. M. Huang / SLDC-IPC / 09.200676 DSP/BIOS Link Supported Platforms  Davinci running Montavista Linux Pro 4.0 or PrKernel v4.1 on ARM  OMAP5912 running Montavista Linux Pro 3.1 on ARM  DA300 running PrKernel v4.1 on ARM  DM642 connected to a PC running Red Hat Linux 9.0 or Red Hat Enterprise Linux 4.0

C. M. Huang / SLDC-IPC / 09.200677 Software Architecture of DSP/BIOS Link

C. M. Huang / SLDC-IPC / 09.200678 On the GPP Side  The OS ADAPTATION LAYER encapsulates the generic OS services that are required by the other components of DSP/BIOS LINK. This component exports a generic API that insulates the other components from the specifics of an OS. All other components use this API instead of direct OS calls. This makes DSP/BIOS LINK portable across different operating systems.  The LINK DRIVER encapsulates the low-level control operations on the physical link between the GPP and DSP. This module is responsible for controlling the execution of the DSP and data transfer using defined protocol across the GPP-DSP boundary.

C. M. Huang / SLDC-IPC / 09.200679 On the GPP Side  The PROCESSOR MANAGER maintains book-keeping information for all components. It also allows different boot- loaders to be plugged into the system. It builds exposes the control operations provided by the LINK DRIVER to the user through the API layer.  The DSP/BIOS LINK API is interface for all clients on the GPP side. This is a very thin component and usually doesn’t do any more processing than parameter validation. The API layer can be considered as ‘skin’ on the ‘muscle’ mass contained in the PROCESSOR MANAGER and LINK DRIVER.

C. M. Huang / SLDC-IPC / 09.200680 On the DSP Side  The LINK DRIVER is one of the drivers in DSP/BIOS. This driver specializes in communicating with the GPP over the physical link.  There is no specific DSP/BIOS LINK API on the DSP. The communication (data/message transfer) is done using the DSP/BIOS modules - SIO/GIO/MSGQ.

C. M. Huang / SLDC-IPC / 09.200681 DSP/BIOS Link Key Components  PROC –This component represents the DSP processor in the application space. –This component provides services to: Initialize the DSP & make it available for access from the GPP. Load code on the DSP. Start execution from the run address specified in the executable. Read from or write to DSP memory. Stop execution. Additional platform-specific control actions. –In the current version, only one processor is supported. However, the APIs are designed to support multiple DSPs and hence they accept a processorID argument to support this future enhancement.

C. M. Huang / SLDC-IPC / 09.200682 DSP/BIOS Link Key Components  CHNL –This component represents a logical data transfer channel in the application space. –CHNL is responsible for the data transfer across the GPP and DSP. –CHNL is an acronym for ‘channel’. –A channel (when referred in context of DSP/BIOS LINK) is: A means of transferring data across GPP and DSP. A logical entity mapped over a physical connectivity between the GPP and DSP. Uniquely identified by a number within the range of channels for a specific physical link towards a DSP. Unidirectional. The direction of a channel is decided at run time based on the attributes passed to the corresponding API.

C. M. Huang / SLDC-IPC / 09.200683 DSP/BIOS Link Key Components  MSGQ –This component represents queue based messaging –This component is responsible for exchanging short messages of variable length between the GPP and DSP clients. It is based on the MSGQ module in DSP/BIOS. –The messages are sent and received through message queues. –A reader gets the message from the queue and a writer puts the message on a queue. A message queue can have only one reader and many writers. A task may read from and write to multiple message queues.

C. M. Huang / SLDC-IPC / 09.200684 DSP/BIOS Link Key Components  POOL –This component provides APIs to open and close memory pools, which are used by the CHNL and MSGQ component for allocating the buffers used in data transfer and messaging respectively. –This component is responsible for providing a uniform view of different memory pool implementations, which may be specific to the hardware architecture or OS on which DSP/BIOS LINK is ported. This component is based on the POOL interface in DSP/BIOS.

C. M. Huang / SLDC-IPC / 09.200685 Initialization Phase API  PROC –PROC_Setup() –PROC_Attach() –PROC_Load()  CHNL –CHNL_Create() –CHNL_AllocateBuffer()  MSGQ –MSGQ_TransportOpen() –MSGQ_Open() –MSGQ_SetErrorHandler() –MSGQ_Locate()  POOL –POOL_Open()

C. M. Huang / SLDC-IPC / 09.200686 Execution Phase API  PROC –PROC_Start() –PROC_Read() –PROC_Write() –PROC_Stop()  CHNL –CHNL_Issue() –CHNL_Reclaim()  MSGQ –MSGQ_Alloc() –MSGQ_Put() –MSGQ_Get() –MSGQ_GetSrcQueue() –MSGQ_Free()

C. M. Huang / SLDC-IPC / 09.200687 Finalization Phase API  PROC –PROC_Detach() –PROC_Destroy()  CHNL –CHNL_FreeBuffer() –CHNL_Delete()  MSGQ –MSGQ_Release() –MSGQ_TransportClose() –MSGQ_Close()  POOL –POOL_Close()

IPC Hardware Architecture

C. M. Huang / SLDC-IPC / 09.200689 Tightly Coupled vs. Loosely Coupled Systems  Tightly coupled systems –Processors share most resources including memory –Communicate over shared buses using shared physical memory  Loosely coupled systems –Processors do not share most resources –Most communication through explicit messages or shared virtual memory (although not shared physical memory)  Comparison –Loosely coupled systems: more flexible, fault tolerant, scalable –Tightly coupled systems: more efficient, less burden to operating system programmers Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200690 Tightly Coupled Systems Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200691 Loosely Coupled Systems Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200692 Processor Interconnection Schemes  Interconnection scheme –Describes how the system ’ s components, such as processors and memory modules, are connected –Consists of nodes (components or switches) and links (connections) –Parameters used to evaluate interconnection schemes Node degree Bisection width Network diameter Cost of the interconnection scheme Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200693 Shared bus multiprocessor organization. Processor Interconnection Schemes Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200694 Crossbar-switch matrix multiprocessor organization. Processor Interconnection Schemes Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200695 4-connected 2-D mesh network. Processor Interconnection Schemes Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200696 3- and 4-dimensional hypercubes. Processor Interconnection Schemes Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200697 Multistage baseline network. Processor Interconnection Schemes Deitel, et al., Operating Systems, Third Edition

C. M. Huang / SLDC-IPC / 09.200698 A Simple IPC Architecture  ARM writes command in shared memory  ARM interrupts DSP  DSP responds to interrupt and reads command in shared memory  DSP executes a task based on the command  DSP interrupts ARM upon completion of the task TMS320DM644x DMSoC ARM Subsystem Reference Guide (SPRUE14)

C. M. Huang / SLDC-IPC / 09.200699 TI OMAP5910

C. M. Huang / SLDC-IPC / 09.2006100 OMAP5910 IPC Architecture  Mailbox registers –Each direction 32bit x 2 –Interrupt occurrence  MPU interface (MPUI) –MPU accesses DSP memory space directly  Shared memory –Arrangement with the Traffic Controller –3 type of memories –Best suitable to large amount of data sharing

C. M. Huang / SLDC-IPC / 09.2006101 Traffic Controller (TC)  The IMIF allows access to the 192K bytes of on-chip SRAM.  The EMIFS interface provides 16-bit-wide access to asynchronous or synchronous memories.  The EMIFF Interface provides access to 16-bit-wide access to standard SDRAM memories.  The TC provides the functions of –arbitrating contending accesses to the same memory interface from different initiators (MPU, DSP, System DMA, Local Bus), –synchronization of accesses due to the initiators and the memory interfaces running at different clock rates, –and the buffering of data allowing burst access for more efficient multiplexing of transfers from multiple initiators to the memory interfaces.  The TC’s architecture allows simultaneous transfers between initiators and different memory interfaces without penalty. For instance, if the MPU is accessing the EMIFF at the same time, the DSP is accessing the IMIF, transfers may occur simultaneously since there is no contention for resources.

C. M. Huang / SLDC-IPC / 09.2006102 ARM IPCM Module  The IPCM provides up to 32 mailboxes with control logic and interrupt generation to support inter-processor communication.  An AHB interface enables access from source and destination cores.  The IPCM: –sends interrupts to other cores –passes small amounts of data to other cores.  A source core can have multiple mailboxes and send messages in parallel (multitasking). PrimeCell Inter-Processor Communications Module Technical Reference Manual

C. M. Huang / SLDC-IPC / 09.2006103 IPCM Components  1-32 programmable mailboxes, each comprising: –a single 1-32-bit Mailbox Source Register –a single 1-32-bit Mailbox Destination Register –a single 2-bit Mailbox Mode Register –a single 1-32-bit Mailbox Mask Register –a single 2-bit Mailbox Send Register –0-7 32-bit data registers to store the message.  1-32 sets of read-only interrupt status registers, one for each interrupt, each comprising: –1-32-bit Raw Interrupt Status Register (each bit corresponds to each mailbox) –1-32-bit Masked Interrupt Status Register (each bit corresponds to each mailbox).  A 32-bit Configuration Status Register

C. M. Huang / SLDC-IPC / 09.2006104 IPCM Functional Block PrimeCell Inter-Processor Communications Module Technical Reference Manual

C. M. Huang / SLDC-IPC / 09.2006105 IPCM Example

C. M. Huang / SLDC-IPC / 09.2006106 IPCM Example  Core0 has a message to send to Core1. Core0 claims the mailbox by setting bit 0 in the Mailbox Source Register. Core0 then sets bit 1 in the Mailbox Destination Register, enables the interrupts and programs the message into the Mailbox Data Registers. Finally, Core0 sends the message by writing 01 to the Mailbox Send Register. This asserts the interrupt to Core1.  When Core1 is interrupted, it reads the Masked Interrupt Status Register for IPCMINT[1] to determine which mailbox contains the message. Core1 reads the message in that mailbox, then clears the interrupt and asserts the acknowledge interrupt by writing 10 to the Mailbox Send Register.  Core0 is interrupted with the acknowledge message, completing the operation. Core0 then decides whether to retain the mailbox to send another message or release the mailbox, freeing it up for other cores in the system to use it.

Conclusions

C. M. Huang / SLDC-IPC / 09.2006108 Conclusions  IPC schemes for supporting many cores  Performance and power consumption analysis for different IPC schemes  IPC API schemes

Thanks for Your Attention!

Inter-Processor Communication for Heterogeneous Dual Core Systems Chun-Ming Huang, Ph.D. National Chip Implementation Center (CIC) 2006/09/27.

Similar presentations

Presentation on theme: "Inter-Processor Communication for Heterogeneous Dual Core Systems Chun-Ming Huang, Ph.D. National Chip Implementation Center (CIC) 2006/09/27."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Inter-Processor Communication for Heterogeneous Dual Core Systems Chun-Ming Huang, Ph.D. National Chip Implementation Center (CIC) 2006/09/27.

Similar presentations

Presentation on theme: "Inter-Processor Communication for Heterogeneous Dual Core Systems Chun-Ming Huang, Ph.D. National Chip Implementation Center (CIC) 2006/09/27."— Presentation transcript:

Similar presentations

About project

Feedback