Presentation is loading. Please wait.

Presentation is loading. Please wait.

Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT.

Similar presentations


Presentation on theme: "Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT."— Presentation transcript:

1 Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT J.T.J. v. Eijndhoven PROMMPT J.T.J. v. Eijndhoven DD&T C. Niessen DD&T C. Niessen ESAS A. van der Werf ESAS A. van der Werf ViPs G. Depovere ViPs G. Depovere IT E. Dijkstra IT E. Dijkstra DS & PC A. van Gorkum DS & PC A. van Gorkum IC Design G. Beenker IC Design G. Beenker ECLIPSE CPU AV & MS Th. Brouste AV & MS Th. Brouste LEP, HVE T. Doyle LEP, HVE T. Doyle CRB 1992-412Jos.van.Eijndhoven@philips.com

2 Philips Research 2 ? DVP: design problem Nexperia media processors

3 Philips Research 3 DVP: application domain High volume consumer electronics products future TV, home theatre, set-top box, etc. Media processing: audio, video, graphics, communication

4 Philips Research 4 DVP: SoC platform Nexperia line of media processors for mid- to high-end consumer media processing systems is based on DVP DVP provides template for System-on-a-Chip DVP supports families of evolving products DVP is part of corporate HVE strategy

5 Philips Research 5 DVP: system requirements High degree of flexibility, extendability and scalability – unknown applications – new standards – new hardware blocks High level of media processing power – hardware coprocessor support

6 Philips Research 6 DVP: architecture philosophy High degree of flexibility is achieved by supporting media processing in software High performance is achieved by providing specialized hardware coprocessors Problem: How to mix & match hardware based and software based media processing?

7 Philips Research 7 DVP: model of computation Process FIFO Read Write A C B Execute Model of computation is Kahn Process Networks: The Kahn model allows ‘plug and play’: Parallel execution of many tasks Configures different applications by instantiating and connecting tasks Maintains functional correctness independent of task scheduling issues TSSA: API to transform C programs into Kahn models

8 Philips Research 8 DVP: model of computation CPUcoproc1coproc2 Application - parallel tasks - streams Mapping - static Architecture - programmable graph

9 Philips Research 9 DVP: architecture philosophy Kahn processes (nodes) are mapped onto (co)processors Communication channels (graph edges) are mapped onto buffers in centralized memory Scheduling and synchronization (notification & handling of empty or full buffers) is performed by control software Communication pattern between modules (data flow graph) is freely programmable

10 Philips Research 10 DVP: generic architecture Shared, single address space, memory model Flexible access Transparent programming model Physically centralized random access memory Flexible buffer allocation Fits well with stream processing Single memory-bus for communication Simple and cost effective

11 Philips Research 11 DVP: example architecture instantiation VLIW cpu I$ video-in video-out audio-in SDRAM audio-out PCI bridge Serial I/O timersI 2 C I/O D$ MIPS cpu I$ D$ Image scaler

12 Philips Research 12 DVP: TSSA abstraction layer TM-CPU software Traditional coarse-grain TM co-processors TSSA stream data, buffered in off-chip SDRAM, synchronization with CPU interrupts TSSA-OS TSSA-Appl1TSSA-Appl2

13 Philips Research 13 DVP: TSSA abstraction layer Hides implementation details: graph setup buffer synchronization Runs on pSOS (and other RTKs) Provides standard API Defines standard data formats

14 Philips Research 14 Outline DVP Eclipse DVP subsystem Eclipse architecture Eclipse application programming Simulator Status

15 Philips Research 15 Eclipse DVP subsystem Objective Increase flexibility of DVP systems, while maintaining cost- performance. Customer Semiconductors: Consumer Systems (Transfer to TTI) Consumer Electronics: Domain 2 (BG-TV Brugge) Research Products Mid- to high-end DVP / TSSA systems: DTVs and STBs

16 Philips Research 16 Eclipse DVP subsystem: design problem Increase application flexibility through re-use of medium-grain function blocks, in HW and SW Keep streaming data on-chip But ? More bandwidth visible Limited memory size High synchronization rate CPU unfriendly SDRAM HDVOcondor MPEGCPU DVP/TSSA system: Coarse-grain ‘solid’ function blocks (reuse, HW  SW ?) Stream data buffered in off- chip memory (bandwidth, power ?)

17 Philips Research 17 Design problem: new DVP subsystem VO MPEG2 decode CPU 1394 DVD decode MPEG2 encode CPU Eclipse external memory

18 Philips Research 18 Eclipse DVP subsystem: application domain Now, target for 1st instance: Dual MPEG2 full HD decode (1920 x 1080 @ 60i) MPEG2 SD transcoding and HD decoding Anticipate: Range of formats (DV, MJPEG, MPEG4) 3D-graphics acceleration Motion-compensated video processing

19 Philips Research 19 Application domain: MPEG2 decoding (HD)

20 Philips Research 20 Application domain: MPEG2 encoding (SD)

21 Philips Research 21 Application domain: MPEG-4 video decoding Reference Pictures Reference Pictures Reference Pictures Inverse Scan Variable Length Decoding IDCT Motion Comp. MV Decoder Reference Pictures Inverse Quantization Picture Reconst. MPEG-4 ES DC & AC Prediction Context Arithmetic Decoding Shape Motion Compensation Shape MV Prediction <384 800 128 90 <220 90 0.1 90 <7

22 Philips Research 22 Sandra Eclipse CPU MPEG-4: system level application partitioning Composition and rendering Video object 3D Gfx object Audio object De-multiplex Scene description Decompression Network layer

23 Philips Research 23 VO (SANDRA) MPEG-4: partitioning Eclipse - SANDRA SDRAM MMI Media CPU D$ I$ SRAM VLD DCT MC VI MBS Eclipse

24 Philips Research 24 Eclipse DVP subsystem: current TSSA style TM-CPU software Traditional coarse-grain TM co-processors TSSA stream data, buffered in off-chip SDRAM, synchronization with CPU interrupts TSSA TSSA-Appl1TSSA-Appl2

25 Philips Research 25 Eclipse DVP subsystem: Eclipse tasks embedded in TSSA TSSA TSSA-Appl1TSSA-Appl2 Eclipse Driver Eclipse task on HW Eclipse task in SW Eclipse data stream via on-chip memory TSSA task on Eclipse TSSA task in SW TSSA task on DVP HW TSSA data stream via off-chip memory

26 Philips Research 26 Eclipse DVP subsystem: scale down Hierarchy in the DVP system: Computational model which fits neatly inside DVP & TSSA Scale down from SoC to subsystem: Limited internal distances High data bandwidth and local storage Fast inter-task synchronization

27 Philips Research 27 Outline DVP Eclipse DVP subsystem Eclipse architecture Model of computation Generic architecture Eclipse application programming Simulator Status

28 Philips Research 28 Eclipse architecture: model of computation CPUcoproc1coproc2 Application - parallel tasks - streams Mapping - static Architecture - programmable - medium grain - multitasking

29 Philips Research 29 Model of computation: architecture philosophy The Kahn model allows ‘plug and play’: Parallel execution of many tasks Application configuration by instantiating and connecting tasks. Functional correctness independent of task scheduling issues. Eclipse is designed to accomplish this with: A mixture of HW and SW tasks. High data rates (GB/s) and medium buffer sizes (KB). Re-use of co-processors over applications through multi-tasking Runtime application reconfiguration.

30 Philips Research 30 Allow proper balance in HW/SW combination Function-specific engines DSP-CPU Application flexibility of given silicon LowHigh Energy efficiency Low High Eclipse

31 Philips Research 31 Previous Kahn style architectures in PRLE CPA C-Heap Eclipse Explicit synchronization Shared memory model Mixed HW/SW Data driven HW synchronization Multitasking coprocs But ? Dynamic applications CPU in media processing But ? High performance Variable packet sizes

32 Philips Research 32 Outline DVP Eclipse DVP subsystem Eclipse architecture Model of computation Generic architecture Coprocessor shell interface Shell communication interface Architecture instantiation Eclipse application programming Simulator Status

33 Philips Research 33 Generic architecture: inter-processor communication On-chip, dedicated network for inter-processor communication: Medium grain functions  High bandwidth (up to several GB/s)  Keep data transport on-chip Use DVP-bus for off-chip communication only

34 Philips Research 34 Generic architecture: communication network Coprocessor CPU Communication network

35 Philips Research 35 Generic architecture: memory Shared, single address space, memory model Flexible access Software programming model Centralized wide memory Flexible buffer allocation Fits well with stream processing Single wide memory-bus for communication Simple and cost effective

36 Philips Research 36 Generic architecture: shared on-chip memory Coprocessor CPU Communication network Memory

37 Philips Research 37 Generic architecture: task level interface Partition functionality between application-dependent core and generic support.  Introduce the (co-)processor shell: Shell is responsible for application configuration, task scheduling, data transport and synchronization Shell (parameterized) micro-architecture is re-used for each coprocessor instance Allow future updates of communication network while re-using (co-)processor core design Implementations in HW or SW

38 Philips Research 38 Communication network layer Generic support layer Computation layer Generic architecture: layering Coprocessor CPU Shell-HW Shell-SW Shell-HW Task-level interface Communication interface Communication network Memory

39 Philips Research 39 Task level interface: five primitives Multitasking, synchronization, and data transport: int GetTask( location, blocked, error, &task_info) bool GetSpace ( port_id, n_bytes) Read( port_id, offset, n_bytes, &byte_vector) Write( port_id, offset, n_bytes, &byte_vector) PutSpace ( port_id, n_bytes) GetSpace is used for both get_data and get_room calls. PutSpace is used for both put_data and put_room calls. The processor has the initiative, the shell answers.

40 Philips Research 40 Task level interface: port IO a: Initial situation of ‘data tape’ with current access point: b: Inquiry action provides window on requested space: c: Read/Write actions on contents: d: Commit action moves access point ahead: n_bytes2 offset n_bytes1 Task A

41 Philips Research 41 Task level interface: communication through streams Task ATask B Space filled with data Empty space AB Granted window for writer Granted window for reader Kahn model: Implementation with shared circular buffer: The shell takes care that the access windows have no overlap

42 Philips Research 42 Task level interface: multicast Forked streams: The task implementations are fixed (HW or SW). Application configuration is a shell responsibility. Task A Task C Task B Space filled with data Empty space AB Granted window for writer Granted window for reader B C Granted window for reader C

43 Philips Research 43 Task level interface: characteristics Linear (fifo) synchronization order is enforced Random access read/write inside acquired window through offset argument Shells operate on unformatted sequences of bytes Any semantical interpretation is left to the processor A task is not aware of where its streams connect to, or other tasks sharing the same processor The shell maintains the application graph structure The shell takes care of: fifo size, fifo memory location, wrap-around addressing, caching, cache coherency, bus alignment

44 Philips Research 44 Task level interface: multi-tasking int GetTask( location, blocked, error, &task_info) Non-preemptive task scheduling Coprocessor provides explicit task-switch moments Task switches separate ‘processing steps’ (Granularity: tens or hundreds of clock cycles) Shell is responsible for task selection and administration Coprocessor provides feedback to the shell on task progress

45 Philips Research 45 Generic support layer Communication network layer Computation layer Generic architecture: generic support Coprocessor CPU Shell-HW Shell-SW Shell-HW Task-level interface Communication interface Communication network Memory

46 Philips Research 46 Generic support: the Shell The shell takes care of: The application graph structure, supporting run-time reconfiguration The local memory map and data transport (fifo size, fifo memory location, wrap-around addressing, caching, cache coherency, bus alignment) Task scheduling and synchronization The distributed implementation: Allows fast interaction with local coprocessor Creates a scalable solution

47 Philips Research 47 Generic support: synchronization Coprocessor A Communication network Shell space – = n PutSpace( port, n ) Coprocessor B Shell space + = n GetSpace( port, m ) Message: putspace( gsid, n ) m  space PutSpace and GetSpace return after local update or inquiry. Delay in messaging does not affect functional correctness.

48 Philips Research 48 Generic support: application configuration Coprocessor Communication network Shell Task_id Stream_id Stream tableTask table addrsizespacegsid... info budget... str_id Shell tables are accessible through a PI-bus interface

49 Philips Research 49 Generic support: data transport caching Translate byte-oriented coprocessor interface to wide and aligned bus transfers. Separated caches for read and write. Direct mapped: two adjacent words per port Coherency is enforced as side-effect of GetSpace and PutSpace Support automatic prefetching and preflushing

50 Philips Research 50 Generic support: cache coherency

51 Philips Research 51 Generic support: task scheduling A simple task scheduler runs locally in each shell: Observes empty/full states of fifos and task blocking Round-Robin selection of ‘runnable’ tasks Parameterized ‘compute resource’ budgets per task Temporary disabling of tasks for reconfiguration at specified locations in the data stream

52 Philips Research 52 Task scheduling: computation budget Computation budget = maximum number of time slices allowed per task selection – Relative budget value controls compute resource partitioning over tasks – Absolute budget value controls task switch frequency, influencing overhead of state save & restore Running budget is set to the computation budget each time the task is selected in round-robin order The running budget is decremented with a fixed clock period, once every time slice

53 Philips Research 53 Task scheduling algorithm TaskId++ mod NrTasks N Y RunningBudget = Budget[TaskId] return TaskId Runnable[TaskId]? RunningBudget > 0 & Runnable? return TaskId GetTask RunningBudget– – Clock Event

54 Philips Research 54 Task scheduling algorithm: dynamic workload Shell does not interpret media data but performs a best guess Space: the amount of available data/room in the stream buffer Blocked flag: true if insufficient space on the last inquiry Schedule flag: If false, a task may be selected even when Space = 0 (data dependent stream selection) Task Enable flag: true if the task is configured to be active

55 Philips Research 55 Task scheduling algorithm: Runnable criterion

56 Philips Research 56 Task scheduling: parallel implementation Task selection background process: 1. For each task, check if it is runnable, based on available space in the stream buffers 2. Select a new task from the list of runnable tasks, round-robin Provide an immediate answer to a GetTask inquiry: – Continue current task if its computation budget is not depleted – Otherwise, start pre-selected next task. Selection of next task may lag behind on buffer status: – Only the active task decreases space in the stream buffer – All incoming PutSpace messages increase space in the buffer

57 Philips Research 57 Task scheduler implementation Active Task Runnable? Task Selection PutSpace Space BlockedScheduleTaskId... GetSpace GetTask TaskId Running Budget GetTask? Next Task EnableRunnableBudget... Task TableStream Table Coprocessor Shell Decrement Budget

58 Philips Research 58 Generic support: internal view Shell Sync DTWDTRSSTS Coprocessor Communication network

59 Philips Research 59 Generic support layer Communication network layer Computation layer Generic architecture: communication network Coprocessor CPU Shell-HW Shell-SW Shell-HW Task-level interface Communication interface Communication network Memory

60 Philips Research 60 Communication network: characteristics Synchronization messages are passed through a token ring, allowing one message per clock cycle Fifos are mapped in a shared on-chip memory, allowing flexible application configuration. Data transport is implemented with a wide data bus: DTL based bus protocol Separately arbitrated busses for read and write Independently pipelined for efficient single-word transfers All communication paths are uni-directional and pipelined, allowing the insertion of clock-domain bridges

61 Philips Research 61 Communication network Shell Arbiter SRAM Shell Token ring Dual DTL bus Communication network

62 Philips Research 62 Communication network: clock domains VLIW CPU wants low and fixed latency for memory access. CPU and memory can run at high clock rate. Synthesized coprocessors and long bus must run at lower clock rate.

63 Philips Research 63 Example Eclipse instantiation 2 x 128 bits @ 150MHz Local bus 32 Kbyte, 128 bit words 128 bits @ 300MHz 64 bits @ 150MHz32 bits @ 75MHz Shell Coproc Shell Coproc Shell Coproc Arbiter Local Memory Shell CPU64 I$D$ EB DVP hub PI bridge PI busDVP bus 300 MHz clock domain150 MHz clock domain

64 Philips Research 64 Outline DVP Eclipse DVP subsystem Eclipse architecture Eclipse application programming Coprocessor definition System software Simulator Status

65 Philips Research 65 Coprocessor definition: starting point Process FIFO Read Write A C B Execute Model of computation: Kahn Process Networks YAPI: simple API to transform C programs into Kahn models Expose parallelism and communication Decisions on grain sizes for processes and data Adopted by various groups in Philips for application modeling

66 Philips Research 66 Application C code Generic YAPI Coprocessor definition: process Eclipse Tailored YAPI Function Control Function Control Function Control Coproc

67 Philips Research 67 Coprocessor definition: control Define processing steps by inserting GetTask, breaking up process iterations. Choose explicit synchronization moments. Implement state saving around GetTask calls. Discern different data types that share a stream. Discern different functions to handle the data.

68 Philips Research 68 Coprocessor definition: packets Packets wrap data; packet headers indicate data type TypePayload NBytesPayload 0 Type1 Byte 0Byte 1Byte 2

69 Philips Research 69 Coprocessor definition: location packets Packets of type ‘location’: Payload holds unique identifier denoting location in the stream. Used for application reconfiguration at specified points in the data processing. All tasks forward location packets to output streams. Location identifiers are passed to the shell via GetTask. The shell compares a location identifier with its corresponding field in the task table. When these match: The task is disabled. The shell sends an interrupt to the cpu. Location identifiers also serve as debug breakpoints.

70 Philips Research 70 Coprocessor definition: example while( true ) { tid = GetTask(location, blocked, error, &task_info); if (!tid) return; blocked = !GetSpace( IN, 2) || !GetSpace( OUT, 2); if (blocked) return; // handle location packets Read( IN, 0, 2, &packet); if (IsLocation( packet)) { location = PayLoad( packet); Write( OUT, 0, 2, packet); PutSpace( IN, 2); PutSpace( OUT, 2); return; } // handle real data...

71 Philips Research 71 Coprocessor definition: example // handle real data size = NBytes( packet); blocked = !GetSpace( IN, 2 + size) || !GetSpace( OUT, OUTSIZE); if (blocked) return; Read( IN, 2, size, &in_data); PutSpace( IN, 2 + size); error = Compute( task_info, in_data, &out_data); Write( OUT, 0, 2 + OUTSIZE, Packet( TYPE, OUTSIZE, out_data)); PutSpace( OUT, 2 + OUTSIZE); }

72 Philips Research 72 System software Different types of software: Media processing software kernels: TM-CPU software with media operations and communication/synchronization primitives. Runtime support: Task scheduler, Quality-of-service control. System re-configuration: Network programming, memory management.

73 Philips Research 73 Outline DVP Eclipse DVP subsystem Eclipse architecture Eclipse application programming Simulator Software architecture Retargetability Flexibility Performance metrics Status

74 Philips Research 74 Simulation objective Verification and validation of the Eclipse architecture Architecture design space exploration Application development platform Starting point for hardware development Collaboration with LEP (Sandra) Transfer to PS-DVI (Dr. Evil)

75 Philips Research 75 Simulator toolchain Create Vld Create Dct Create Mc Dct{ NTasks: 2 Shell{ NStreams: 2 Dtr.NPorts : 1 } Application setup Architecture setup Performance metrics Wave forms Debug traces eclipse_sim -d2 -c1000 -l1 -DTHREADLEVEL=2 Simulation mode

76 Philips Research 76 Simulator flexibility: simulation modes Modes of execution Sequential execution  Application development with functional verification Timed execution  System level performance analysis TSS execution  Hardware development All execution modes are implemented in one code base. Only the interfaces differentiate between these modes.

77 Philips Research 77 Simulator: modeled hardware architecture Coprocessor DtwSsTs Sync Dtr Shell Transport network Sync network ReadWrite GetSpace PutSpace GetTask

78 Philips Research 78 Simulator software architecture Coprocessor DtwSsTs Sync Dtr Shell Transport network Sync network IF m mmm s ss s mm ss s m

79 Philips Research 79 Simulator software architecture: shell Dtw SsTs Sync Dtr IF s sm m s m m s m s Shell

80 Philips Research 80 Simulator components

81 Philips Research 81 Simulator: sequential execution Very fast functional verification One single thread of control Communication through function calls Statistics, e.g. number of reads, cache misses, … Compiles and runs without TSS

82 Philips Research 82 Simulator: sequential execution implementation

83 Philips Research 83 Simulator: timed execution Performance metrics Full communication protocols Sequential C-code via multi-threading Run time definition of threads Compiles and runs without TSS

84 Philips Research 84 Simulator: timed execution implementation

85 Philips Research 85 Timed execution: Execute() void Dct::Thread() { while( 1 ) { Dct(); Execute(64); } } void Execute(int delay) { while( delay > 0 ) { delay--; JumpMain(); } } void MainScheduler() { for (int cycle=0; cycle JumpThread(); Vld->JumpThread(); Mc->JumpThread(); } }

86 Philips Research 86 void DtrInterface::Read(int port, int offset, int size, DataT &data) { PortOut.Set( port ); OffsetOut.Set( offset ); SizeOut.Set( size ); RequestOut.Set( !RequestOut ); while ( AckIn.Get() != RequestOut ) JumpMain(); data = DataIn.Get(); } void Dct::Thread() { while( 1 ) { … DtrInterface->Read(0,0,8,data); … } } Timed execution: Read() void DtrInterface::Poll() { if ( RequestIn.Get() != AckOut ) { int port = PortIn.Get(); int offset = OffsetIn.Get(); int size = SizeIn.Get(); DataT data[size]; Dtr->Read( port, offset, size, data ); DataOut.Set( data ); AckOut->Set( RequestIn.Get() ); } } void Dtr::Read(int port, int offset, int size, DataT &data) { … // Get data from cache data = … }

87 Philips Research 87 Simulator: TSS execution Dynamic binding of TSS code to the simulator Run time definition of TSS module boundaries Thread model inside TSS module TSS port creation Automatic Netlist generation

88 Philips Research 88 Simulator: TSS execution implementation

89 Philips Research 89 Shell TSS: module boundaries Vld Transport Network DtrDtwSsTs Sync Mc DtrDtwSsTs Sync Vld.ModuleName : Vld Vld.Shell.ModuleName : VldShell Mc.ModuleName : Mc Mc.Shell{ Dtr.ModuleName : McShellDtr Dtw.ModuleName : McShellDtw Ss.ModuleName : McShellSs Ts.ModuleName : McShellTs Sync.ModuleName : McShellSync } Transport.ModuleName : Transport

90 Philips Research 90 TSS: module boundaries Vld Shell Transport Network DtrDtwSsTs Sync Vld Shell DtrDtwSsTs Sync ModuleName : Eclipse

91 Philips Research 91 TSS: co-simulation TSS-Verilog Ts Vld Shell Transport Network DtrDtwSsTs Sync Mc Shell DtrDtwSsTs Sync

92 Philips Research 92 Simulator retargetability Create Vld Create Dct Create Mc Dct{ NTasks: 2 Shell{ NStreams: 2 Dtr.NPorts : 1 } Application setup Architecture setup Performance metrics Wave forms Debug traces eclipse_sim -d2 -c1000 -l1 -DTHREADLEVEL=2 Simulation mode

93 Philips Research 93 Simulator retargetability: Eclipse instantiation Create Vld Create Dct Create Mc Dct{ NTasks: 2 Shell{ NStreams: 2 Dtr.NPorts : 1 } VldDct Shell Transport Network Mc Shell

94 Philips Research 94 Coprocessor instantiation Create Vld Create Vld Create Mc Shell

95 Philips Research 95 Architecture setup

96 Philips Research 96 Retargetability: application configuration Dct.Shell{ Ss.StreamTable{ TASK_ID: 1 BUF_SPACE : 0x100 }

97 Philips Research 97 Application setup

98 Philips Research 98 Simulator output Create Vld Create Dct Create Mc Dct{ NTasks: 2 Shell{ NStreams: 2 Dtr.NPorts : 1 } Application setup Architecture setup Performance metrics Wave forms Debug traces eclipse_sim -d2 -c1000 -l1 -DTHREADLEVEL=2 Simulation mode

99 Philips Research 99 Simulation output: wave forms

100 Philips Research 100 Simulator output: performance data collection Collection of critical performance indicators Subset of performance indicators implemented in HW in stream and task tables Used for: Architecture evaluation at silicon design time Application tuning at application design time QoS resource management at run-time

101 Philips Research 101 Viewing performance data

102 Philips Research 102 Viewing performance data: processor dynamics

103 Philips Research 103 Viewing performance data: processor metrics

104 Philips Research 104 Viewing performance data: buffer filling

105 Philips Research 105 Outline DVP Eclipse DVP subsystem Eclipse architecture Eclipse application programming Simulator Status

106 Philips Research 106 Status Abstraction High Low Cost Low High Alternative realizations Initial architecture study (1997) Feasibility study (October 1998) Generic architecture definition (August 1999) Specific architecture definition (February 2000) Specific architecture implementation (July 2000)

107 Philips Research 107 Current status Eclipse documentation Concepts Design path Implementation Applications: Coprocessor functional models for MPEG2 HD/SD decoding (Vld, Mc, Idct, Rlsq) supporting downscaling MPEG2 encoder generic Yapi MPEG4, 3D Gfx scheduled for 2001 Natural Motion anticipated

108 Philips Research 108 Simulator status Simulator framework: Retargetable and flexible through design patterns Re-use of methodology, design patterns, implementation (Sandra, QoS, TSSA-2) Simulator hardware model: Functional, bit-level accurate model of shells Abstract model of transport network and coprocessors Simulator toolchain: Approx. 25,000 lines of C++ code, 250 file ( CVS version management, multi-platform makefile structure, automatic source documentation ) Integration testing phase Submitted to CRE 2001

109 Philips Research 109 Conclusion Eclipse fits neatly in DVP system level architecture Flexibility through: Application (re-)configuration Medium-grain HW / SW interaction Co-processor multi-tasking (without runtime CPU control) Cost-effectiveness through: HW / SW balancing Time-shared co-processor use Tools for application configuration, simulation, and performance analysis are alive

110 Philips Research 110 Acknowledgements Persons from several groups in PRLE: IPA (Lippmann): Evert-Jan Pol, Jos van Eijndhoven, Martijn Rutten, Anup Gangwar ESAS (van Utteren): Pieter van der Wolf, Om Prakash Gangwal, Gerben Essink IT (Dijkstra): Koen Meinds Video processing & Visual Perception (Depovere): Gerben Hekstra, Egbert Jaspers, Erik van der Tol, Martijn van Balen Digital Design & Test (Niessen): Manish Garg


Download ppt "Philips Research ECLIPSE Extended CPU Local Irregular Processing Structure IST E. van Utteren IST E. van Utteren IPA W.J. Lippmann IPA W.J. Lippmann PROMMPT."

Similar presentations


Ads by Google