May Mike Drob Grant Furgiuele Ben Winters Advisor: Dr. Chris Chu Client: IBM IBM Contact – Karl Erickson
Project Overview Circuit Placement problem is bottleneck of physical design Currently only single-core – no threads Will attempt to parallelize some functions of the FastPlace algorithm using the linux pthreads library. Implement RQL idea (IBM) into FastPlace
Project Plan Start with existing serial FastPlace algorithm Parallelize FastPlace algorithm to decrease run-time Hope to gain increases as close to N times speedup (N = cores) as possible Realistically, expect 0.75N or 0.5N End-goal is mostly proof-of-concept IBM uses in-house algorithm Contains proprietary circuit processing
Project Design Written in C Run under Linux using POSIX thread library Consider scalability – 2, 4, 8, etc. cores RQL implementation IBM Concept Netlist optimization for placement
Implementation – Overall Using Data Parallelism as scheme Assigning loop iterations to threads Localizing variable usage Where absolutely necessary, using thread synchronization (mutex, etc..) To maximize speed improvement with threads, minimize total number of tasks for threads to accomplish Have individual threads do as much as possible
Implementation – Thread Pool Threads are created once at start Various Benefits: Minimizes overhead from thread creation Increases cache performance Allows core scalability – number of threads running can equal cores available
Implementation - RQL Force-vector Modulation Forces acting upon cells Forces are modeled as a spring potential energy problem Native Force in the algorithm tries to reduce wire length by bringing connected cells closer to each other Spreading Force tries to move cells into sparse areas within the placement region Need a balance of the two to meet placement and wire length objectives Modulate the Spreading Forces High Spreading Forces means the connection belongs to a fan-out net or boundary Therefore, cells with connections in the top 5 percentile of spreading forces are skipped in quadratic placement Skipping these leaves the cell’s other connections minimized instead of degrading them. Results in placing cells in their overall optimal location
Implementation - RQL During quadratic placement (global placement process) Calculate magnitude of spreading forces for all cells in each iteration Calculate force on current cell If current cell’s force is above the 5% threshold, skip its placement
Implementation - Functions Move_8pt family move_8pt, move_8pt_withMap, move_8pt_mixedMode, move_8pt_mixedMode_withMap, move_8pt_clustering, move_8pt_clustering_withMap Calculates score based on cell coordinates and bin utilization Doesn’t lend well to parallelization The fix? If a new cell is within 3x3 box of cell being currently calculated for, new cell is skipped Helps remove significant wirelength degradation
Implementation - Functions Swap_move family swap_move_FM, vswap_move, local_order3_FM, flipAllCells Row-based data processing Break up matrix into segments based on number of threads Assign each thread to do X rows
Testing Profiled original FastPlace algorithm gprof gives CPU time per function Profiling parallel FastPlace Valgrind FastPlace code outputs actual time elapsed Can be used to compare performance Not 100% consistent
Testing & Results Test results for correctness Compare “wire length” results Average total wirelength no worse than 1% greater Threadpool is tested and working Test results for speedup Compared actual run-time See slides on next page
Test Results – RQL Implementation Wire length Results Between.12% % decreased wire length on ISPD98 benchmarks with an average of.98% Between.11% % decreased wire length on ISPD2005 benchmarks with an average of 1.39% Run-time Results Some run-time slow down Average of 3.36% increased on ISPD98 Average of 4.02% increased on ISPD2005
Test Results – Global Placement
Test Results – Detailed Placement
Project Impact Shows that threads can be used to speed up the placement process With availability of multi-core CPU’s, and scalability of thread implementation, speed improvement could continue Reduces bottleneck in development process
Questions?