Presentation is loading. Please wait.

Presentation is loading. Please wait.

QUIZ 1. Question 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which.

Similar presentations


Presentation on theme: "QUIZ 1. Question 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which."— Presentation transcript:

1 QUIZ 1

2 Question 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which drawbacks of the traditional FPGA CAD flow are targeted with the fragment level moves? 2

3 BSPlace: A BLE Swapping technique for placement 04.11.2014 Minsik Hong George Hwang Hemayamini Kurra Minjun Seo 3

4 Outline SCPlace Introduction Algorithm flowchart Net Counting Algorithm Results BSPlace Algorithm Demo Backup Slides If you guys ask minimal questions we can cover more Net Weighting VPR Datastructures 4

5 Rajavel, Senthilkumar Thoravi, and Ali Akoglu. "MO-Pack: Many-objective clustering for F PGA CAD." Proceedings of the 48th Design Automation Conference. ACM, 2011. 5

6 Simultaneous timing driven clustering and placement for FPGAs. Chen, Gang, and Jason Cong. Field Programmable Logic and Application. Springer Berlin Heidelberg, 2004. 158-167. 6

7 Key concept Fragment level move BLE to a new CLB Check for valid CLB configuration Feasibility (number of BLEs and input pins) Update the cost function Block level move CLB to CLB 7

8 BLE Level Swapping Advantages Fix Packing issues during simulated annealing Better Congestion Mitigation Better at Routeability Disadvantages Speed Complexity 8

9 SCPlace Algorithm 9

10 10

11 Additional feature of Journal version SCPlace 11

12 Use Novel net weighting 12

13 A novel net weighting algorithm for timing- driven placement Kong, Tim Tianming. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design. ACM, 2002. 13

14 Accurate All Path Counting 14

15 a b c d e f 5 7 1 5 3 0/0 0/2 7/7 8/8 13/13 11/13 ARR/REQ a b c d e f Calculate F(t) Fs(a, c) = 7 – 0 – 7 = 0 Fs(b, c) = 7 – 0 – 2 = 2 2 0 0 0 0 a=2, T: the longest path delay 1 1 0 0 0 0 F(c) = F(c) + D{Fs(a, c), T} x F(a) + D{Fs(b, c), T} x F(b) = 0 + 1x1 + 0.88x1 = 1.88 1.88 1 1 delay 15

16 Calculate B(s) a b c d e f 5 7 1 5 3 0/0 0/2 7/7 8/8 13/13 11/13 ARR/REQ a b c d e f 0 0 1 1 0 0 Bs(d, e) = 13 – 5 – 8 = 0 Bs(d, f) = 13 – 3 – 8 = 2 0 0 0 0 2 a=2, T: the longest path delay D{Bs(a, c), T} = D{0,13} = 1 D{Bs(b, c), T} = D{0,13} = 1 D{Bs(c, d), T} = D{0,13} = 1 D{Bs(d, e), T} = D{0,13} = 1 D{Bs(d, f), T} = D{2,13} = 0.88 B(d) = B(d) + D{Bs(d, e), T} x B(e) + D{Bs(d, f), T} x B(f) = 0 + 1x1 + 0.88x1 = 1.88 1.88 1 1 16

17 Calculate AP(s, t) (a=2) D{slack(a, c), T} = D{0,13} = 1 D{slack(b, c), T} = D{2,13} = 0.88 D{slack(c, d), T} = D{0,13} = 1 D{slack(d, e), T} = D{0,13} = 1 D{slack(d, f), T} = D{2,13} = 0.88 a b c d e f 1.88/1.88 1.88/1 1/1.88 2 0 0 0 2 F(s)/B(t) slack AP(a,c) = F(a) x B(c) x D{slack(a, c), T} = 1 x 1.88 x 1 = 1.88 AP(b,c) = F(b) x B(c) x D{slack(b, c), T} = 1 x 1.88 x 0.88 = 1.65 a b cd e f 1.88 1.65 3.53 1.88 1.65 17

18 Results (Only use BLE swapping) 18 CLB = 4

19 Results (Only use BLE swapping) 19

20 Results (BLE + CLB swapping) 20

21 Results (BLE + CLB swapping) T-Vpack+VPR vs SCPlace (α=0.5) 21

22 BSPlace 22

23 BSPlace BLE Level Swapping within Simulated Annealing with Rent’s Rule Advantages Fix packing issues as they occur. Potentially better routability. Potentially better congestion due to combination of placement and packing. Disadvantages Execution time – We need to do memory allocation and deallocation for any ble swapping. Code Complexity – VPR is complex. We focus a lot of time with debugging and testing instead of algorithms. 23

24 Rent’s Rule Threshold Value Calculate the k value to get threshold Enter simulated annealing process Outer loop process Inner loop process Choose random CLB to move from current position to another position Check Rent’s Rule Threshold If we get a better result for swap Queue BLE Swapping Otherwise Do CLB swapping :Use T-v place Loop Through BLE Swapping Do BLE Swap after checking whether swap overlaps with previous swap Re-Allocated Memory and return to outer loop 24

25 Current Status Code Created our own BLE swapping mechanism using VPR data structure. We have a whole suite of test fixtures to test code. Testing still continuing, but we are finding minimal issues. We have done a swap within placement. We have started to integrate our cost function Validation We intend to run VPR benchmarks. Our BLE swapping solution should be better or the same as TV-Place. Our VPR benchmarks should also be comparable to IRAC. 25

26 The circuit below abstracts the MUX, switchboxes, and connection boxes. The connections represent the direct connections between bles in clbs. Op timize this circuit by performing one BLE swap. Explain why your optimizat ion will result in better performance. Architecture Parameter K = 2 I = 3 N = 2 Measurement Critical Path Delay = 1.182ns Demo 26

27 Demo http://www.screenr.com/gJdN 27

28 Demo 28

29 Thanks. 29

30 Backup Slides 30

31 Impact of duplication on placement Delay = 2 Delay = 1 31

32 A novel net weighting algorithm for timing- driven placement Kong, Tim Tianming. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design. ACM, 2002. 32

33 A Novel Net Weighting Algorithm Accurate path counting algorithm The first known accurate path counting algorithm that considers all paths Due to experimental number of paths present in the circuit, accurate all path counting has been considered very difficult. Significant performance improvement Little loss in total wirelength No runtime overhead 33

34 A Novel Net Weighting Algorithm consider the path sharing effect If two critical paths share a common segment, the edges in the common segment should receive higher weights. Define two variables Forward path F(p) - the number of different critical paths starting from P I elements, terminating at p. Backward path B(p) – the number of different critical paths staring from P O elements, terminating at p, if we reverse all signal flow directions. 34

35 Background 35

36 Background 36

37 Example a b c d e f 5 7 1 5 3 Timing of a circuit 0 0 7 8 1313 1 5 7 1 5 3 ARR(t) 0 2 7 8 1313 1313 5 7 1 5 3 REQ(s) The longest path delay (T) 37

38 Example 0 2 0 0 0 2 5 7 1 5 3 Slack(s, t) 5 7 1 5 3 0/0 0/2 7/7 8/8 13/13 11/13 38

39 Example 0 0 0 0 7 1 5 d(π) = 13, slack(π) = 0 2 0 0 2 5 1 3 0 0 0 2 7 1 3 2 0 0 0 5 1 5 d(π) = 9, slack(π) = 4 d(π) = 11, slack(π) = 2 39

40 Critical Path counting 40

41 Calculate F(p) 0 0 0 0 0 0 5 7 1 5 3 1 1 0 0 0 0 5 7 1 5 3 1 1 2 2 2 2 5 7 1 5 3 41

42 Calculate B(p) 0 0 0 0 0 0 5 7 1 5 3 0 0 0 0 1 1 5 7 1 5 3 2 2 2 2 1 1 5 7 1 5 3 42

43 Calculate GP(s,t) 2 2 2 2 1 1 5 7 1 5 3 1 1 2 2 2 2 5 7 1 5 3 a b c d e f 2 2 4 2 2 43

44 Accurate All Path Counting Use discount function to get accurate counting result ‘a’ is a positive constant number x Fs(s,t) = ARR(t) – ARR(s) – d(s,t) Bs(s,t) = REQ(t) – REQ(s) – d(s,t) y is the longest path delay (T) 44

45 Accurate All Path Counting 45

46 Ex. Calculate F(t) (a=2) a b c d e f 5 7 1 5 3 0/0 0/2 7/7 8/8 13/13 11/13 D{Fs(a, c), T} = D{0,13} = 1 D{Fs(b, c), T} = D{2,13} = 0.88 D{Fs(c, d), T} = D{0,13} = 1 D{Fs(d, e), T} = D{0,13} = 1 D{Fs(d, f), T} = D{0,13} = 1 a b c d e f 5 7 1 5 3 1 1 1+0.8 8 1.88 46

47 Ex. Calculate B(s) (a=2) a b c d e f 5 7 1 5 3 0/0 0/2 7/7 8/8 13/13 11/13 D{Bs(a, c), T} = D{0,13} = 1 D{Bs(b, c), T} = D{0,13} = 1 D{Bs(c, d), T} = D{0,13} = 1 D{Bs(d, e), T} = D{0,13} = 1 D{Bs(d, f), T} = D{2,13} = 0.88 a b c d e f 5 7 1 5 3 1.88 1+0.88 1 1 47

48 Ex. Calculate AP(s,t) (a=2) a b c d e f 5 7 1 5 3 1.88 1+0.8 8 1 1 a b c d e f 5 7 1 5 3 1 1 1.88 a b c d e f 1*1.88*1 = 1.88 D{slack(a, c), T} = D{0,13} = 1 D{slack(b, c), T} = D{2,13} = 0.88 D{slack(c, d), T} = D{0,13} = 1 D{slack(d, e), T} = D{0,13} = 1 D{slack(d, f), T} = D{2,13} = 0.88 1*1.88*0.88 =1.65 1.88*1.88*1 =3.53 1.88*1*1 =1.88 1.88*1*0.88 =1.65 48

49 Compare results a b c d e f 1.88 1.65 3.53 1.88 1.65 a b c d e f 2 2 4 2 2 Using Critical counting method (GPATH), it is difficult to get accurate result. However, if we use proposed algorithm, we can get more accurate result. 49

50 VPR Datastructures Resource Routing Graph Physical Block Graph Netlist Global CLB Netlist Global Atom Netlist Blocks 50

51 Blocks Contains CLB Contains the Input Output Contains the Resource Routing Graph Contains the Physical Blocks Physical Blocks represents the BLE Physical Blocks represents the Flip Flop Physical Blocks also contains the LUTs 51

52 Resource Routing Graph Nodes are pins Edges are architectural connections Each pin is associated with a net num Prev Nodes and Edges represents the actual connections per ble. 52

53 Global Netlist 53

54 Atom Netlist 54


Download ppt "QUIZ 1. Question 1) According to the study on “Simultaneous Timing Driven Clustering and Placement for FPGAs”, what is a fragment level move and which."

Similar presentations


Ads by Google