Presentation is loading. Please wait.

Presentation is loading. Please wait.

Notes on an actor language Jörn W. Janneck Xilinx Inc. 13 February 2007 – 7 th Ptolemy Miniconference.

Similar presentations


Presentation on theme: "Notes on an actor language Jörn W. Janneck Xilinx Inc. 13 February 2007 – 7 th Ptolemy Miniconference."— Presentation transcript:

1 Notes on an actor language Jörn W. Janneck Xilinx Inc. 13 February 2007 – 7 th Ptolemy Miniconference

2 CAL Actor Language scripting actor specifications – make it easier to write atomic actors experimenting with domain polymorphism (code generation) CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application

3 actors in CAL encapsulated state Actions State guarded atomic actions CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application

4 simple actors actor Sum () Input ==> Output: sum := 0; action [a] ==> [sum] do sum := sum + a; end Sum actor SumAbs () Input ==> Output: sum := 0; action [a] ==> [sum] guard a >= 0 do sum := sum + a; end action [a] ==> [sum] guard a < 0 do sum := sum - a; end SumAbs Input Output CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application

5 nondeterminism actor NDMerge () Input1, Input2 ==> Output: action Input1: [x] ==> [x] end action Input2: [x] ==> [x] end end NDMerge Input1 Output Input2 CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application

6 data-dependent token flow actor Select () S, A, B ==> Output: action S: [sel], A: [v] ==> [v] guard sel end action S: [sel], B: [v] ==> [v] guard not sel end end Select S Output B A CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application

7 CAL and domain polymorphism two fundamental questions: 1. Can an actor be interpreted/used in a given MoC? 2. What is its interpretation?  domain-specific interpretation CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application

8 Example: SDF actor Add () Input1, Input2 ==> Output: action [a], [b] ==> [a + b] end end actor AddSeq () Input ==> Output: action [a, b] ==> [a + b] end end Add Input1 Output Input2 1 1 1 AddSeq Input Output 21 CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application

9 Example: SDF (cont’d) actor NDMerge () Input1, Input2 ==> Output: action Input1: [x] ==> [x] end action Input2: [x] ==> [x] end end NDMerge Input1 Output Input2 actor Merge () Input1, Input2 ==> Output: action [x1], [x2] ==> [x1, x2] end end Merge Input1 Output Input2 1 1 2 CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application

10 Some kind of “synchronous”... NDMergeA 2 1 1 F 11 Merge 1 1 2 11 CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application

11 Example: CSP actor NDMerge () Input1, Input2 ==> Output: action Input1: [x] ==> [x] end action Input2: [x] ==> [x] end end actor Add () Input1, Input2 ==> Output: action [a], [b] ==> [a + b] end end [ Input1 ? x -> Output ! x || Input2 ? x -> Output ! x ] Input1 ? a -> Input2 ? b -> Output ! a + b [ Input1 ? a -> Input2 ? b || Input2 ? b -> Input1 ? a ] ; Output ! a + b CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application

12 Example: CSP (cont’d) actor Select () S, A, B ==> Output: action S: [sel], A: [v] ==> [v] guard sel end action S: [sel], B: [v] ==> [v] guard not sel end end S ? sel; [ sel -> A ? v -> Output ! v || not sel -> B ? v -> Output ! v ] actor A () X, Y ==> Z: action X: [x1, x2] ==> [f(x1, x2)] guard P(x1, x2) end action Y: [y1, y2] ==> [f(y1, y2)] guard P(y1, y2) end end ? CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application

13 CAL and dataflow at Xilinx class MyActor { schedule(); readPort( portNum ); writePort( portNum ); } software hardware actor source + network high-level synthesis simulation new FPGA programming model & tools hardware code generation software (& mixed) code generation driver application MPEG4 Simple Profile Decoder MPEG standardization effort ISO/IEC 23001-4 (working draft): Codec Configuration Representation ISO/IEC 23002-4 (working draft): Video Tool Library CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application

14 FPGA Programming In Practice Networked MPEG-4 Viewer Microblaze running LWIP protocol stack Decoder Actor Network Raster Scan Actor VGA Display IP XUP Board (2VP30) Remote Video Stream Server UDP over Ethernet Local VGA Monitor Ethernet UDP Memory Controller VGA Display IP CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application

15 MPEG-4 SP Decoder quality of compiled code Version Area Performance SliceLUTFFBRAMMULT VHDL IP 1 (15000 lines) 46377923263726 2 34 4-CIF image size 180K macroblock/s @ 100MHz Requires ZBT SRAM framebuf CAL decoder (4000 lines) 38727720357622 3 7 HD image size 243K macroblock/s @ 120MHz Interfaces to DRAM framebuf I-frame parsing: 50 Mbit/s 1 http://www.xilinx.com/bvdocs/ipcenter/data_sheet/ds520_prod_brf.pdf 2 BRAM-limited to 4-CIF image size. 3 Supports HD image size. Reduces to 16 BRAMs for 4-CIF image size. CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application

16 comparing decoder solutions throughput macroblocks/sec x1000 relative area efficiency 1 2 5 10 100 1000 CIF SD HD a a TI64xx MPEG-4 (CPU + L1 cache only) b c FPGA MPEG-4 using traditional HDL flow (12 MM effort) c d FPGA MPEG-4 using actor/dataflow synthesis (3 MM effort) d b ISSCC’06 H.264 capable (includes periphery) CAL @ Ptolemy the language domain-dependent interpretation CAL @ Xilinx overview application

17 Thank You. CAL actor language:embedded.eecs.berkeley.edu/caltrop Credits: Dave B. Parlour, Ian D. Miller, Johan Eker, Edward A. Lee, and many others.

18

19 BACKUP

20 programming language adoption NameTPCITPCI cum.Year C17.66%17.66%1973 C++11.06%28.73%1985 Perl5.48%34.20%1987 Python3.47%37.67%1990 VB9.73%47.40%1991 Delphi2.15%49.54%1994 Java21.17%70.72%1995 PHP9.86%80.58%1995 JavaScript2.20%82.78%1995 C#3.07%85.85%2002 source: TIOBE Programming Community Index, TPCI, October 2006, http://www.tiobe.com/tpci.htm 197019751980 19851990199520002005 50 100 C C++ Perl Python VB Delphi Java PHP JavaScript C# cumulative TCPI by language creation date (for top 10 languages)

21 Smaller, Faster, Easier Too good to be true? This is what happens when design effort is constrained. The key is enabling architectural exploration with rapid turn-around time. New decoder architecture incorporates many improvements over original design in motion compensation, AC/DC reconstruction, parser, 2-d IDCT. Approximate manpower numbers: – VHDL decoder: 12 months – Dataflow decoder: 3 months

22 Architectural Exploration MPEG4 Motion Compensator video stream feedback video frame buffer (off-chip DRAM) PROBLEM! Memory latency for random access reads and writes prevents real-world operation at HD rates.

23 First Step: Try on-chip cache Break the address and data streams, insert a cache placeholder. Insert different policies, see what happens. policy1 Pass-through just to make sure model is OK. policy2 Insert a cache actor in the read path and monitor statistics.

24 Simulation result with policy2 Frame 1 OK time: 28111ms Frame 2 OK time: 23834ms Requests: 49456, Hits: 45360 Miss rate: 8.28% Frame 3 OK time: 27369ms Requests: 98704, Hits: 90512 Miss rate: 8.30% Monitor console Memory controller performance 133MHz clock 32 pixel cache line fill in ~18 cycles Worst case compensation is 81 reads for an 8x8 block. 8.3% miss rate implies average read is ~ 2.4 cycles Rate limit is 44 Mpixel/s HD (1920p, 4:2:0, 30fps) rate target is 93.3 Mpixel/s Options for improvement - more expensive controller - much better cache policy - application-aware prefetch

25 Step2: Application-aware prefetch replace cache with “search window” compensation addresses now relative to search window search window senses block type prefetch requests to frame buffer prefetch data

26 Results of prefetch strategy Better performance – prefetch needs to operate at 3x pixel rate – exploits longer burst read with application-awareness (longer cache line did not help policy2 significantly) – 64 pixels in 26 cycles → average read is ~ 0.4 cycles – peak theoretical performance is 111 Mpixel/s – exceeds HD rate target with cheap DRAM Substantial change to overall model behavior, but – impact limited to two actors – no refactoring of control in other actors needed

27 The FPGA programming problem Big, heterogeneous chips circuit-design programming (+ C, Simulink,...) 1985: 128 4-LUTs 2006: [V5-LX] 207360 6-LUTs 10Mbit BRAM 192 ALUs


Download ppt "Notes on an actor language Jörn W. Janneck Xilinx Inc. 13 February 2007 – 7 th Ptolemy Miniconference."

Similar presentations


Ads by Google