Presentation on theme: "RAPID Standard Cell Library Evaluation by David Artz & Cory Krug Oracle Labs November 2011."— Presentation transcript:
RAPID Standard Cell Library Evaluation by David Artz & Cory Krug Oracle Labs November 2011
2 Introduction Standard Cell (Logic, ECO, Power) Library Evaluation Comparison Criteria Performance Power Layout Architecture (routability, area, power rails, tapless, etc.) Features Drive Strengths Supported Views (CCS/NLDM, APL, DFT, etc.) Documentation, Support
3 vs. Standard Cell Comparisons Two libraries were compared, and. Both vendors have what they call general purpose (or low power) variants built around a 9 track layout architecture and high performance (12 track) variant. Both vendors supply typical mix of combinational (simple & complex), sequential (latches/flops), I/O’s, ECO cells, power management cells (header/footer switches, level shifters, isolation cells), etc., this is where the similarities end. has a much richer library in terms of drive strengths, beta ratios, and device lengths where has only V t mix.
4 vs. Richness (V t, L eff, General Purpose and High Performance)
5 vs. Richness (Functionality & Drive Strengths/Beta Ratio’s) For the purpose of comparison the standard cell libraries are categorized as follows:
6 vs. Richness (Functionality & Drive Strengths/Beta Ratio’s Cont.) 6/20 indicates 6 functions, with 20 drive strengths total 9 track standard cell library comparison
7 vs. Richness (Functionality & Drive Strengths/Beta Ratio’s Cont.) Previous chart shows 22% more functions in the library over. This can be misleading as it is my opinion many of these functions are of little use (e.g., many dubiously useful flavors of scanable flip-flops with both Q and QN outputs, etc.) Despite the richer feature set of the library the library has 29% addition drive strengths which consist of differing beta ratios and finer drive granularity. Beta Ratios: device P/N sizing to adjust timing arc performance, e.g. – Max finger size which fit’s in cell – Minimize the average delay – Equalize the delays – Equalize the output slews (e.g., on clock cells) – Minimize the maximum delay – Minimize the delay for rising output libraries use all of the above beta ratios (where they make sense). uses only minimize(max(tp lh, tp hl )). The finer granularity and sizing's allows optimization approaches to better fine tune for power, performance, and area goals. The “multi-channel” libraries from afford even more optimization opportunities for improving leakage and minimizing processing variance (especially important in clocking).
8 vs. Documentation documentation is more readable and concise. – Truth tables show “don’t care” conditions rather then explicitly listing out all permutations of input/output states. – Detailed descriptions of the operating conditions and constraints over which the cells were characterize (e.g., surrounding dummy metal included at representative densities, etc.) are given. – BKM’s on routablity, power strapping, etc. within commercial tools is documented. – Gate level schematic diagrams are included and not just a cell symbol.
9 vs. Layout cell pitch is 0.14um while is 0.18um Power rails in are M2 while is classical M1. My experience has taught me M2 affords better IR drop robustness and little impact if any to routability. All library offerings come with a standard tech.lef defining BEOL for various stackups. Both libraries are tapless allowing for back biasing to reduce leakage.
10 vs. Views Both vendors offer all typical library views (schematic symbols, place & route LEF, verilog, pre & post spice decks, DFT, etc.) has some pre-compiled views (e.g., milkyway) where as does not. Timing & Power Views – Synopsis.lib in NLDM and CCS are supplied. libraries elicit warnings when checked with the semantic checker, ’s do not. – The number of indices in NDLM tables are the same but interestingly characterizes over a much broader range (e.g., on small inverters 50% wider range of input slews and 280% wider on loads) then. – appears to characterize more robustly for power then, e.g., internal nodal currents are captured for header/footer switches. – APL (Apache Power Libraries) are supposed to be available from both vendors (note, it appears only offers APL for 12 track libraries) – When comparing the closest matching cells across libraries (functions, drive strength, PVT, input slew rate, output loading, etc.) the cells appear to be on average 3% faster in performance then. I feel this is more of a characterization discrepancy (what conditions did assume in the neighborhood used for characterizing these cells, was the input an ideal voltage source or a properly shaped waveform, was the output a passive cap or another representative DUT, etc.) then an actual difference in performance.
11 vs. Misc. Observations offers thick gate oxide decoupling caps, appears not to. Thicker oxides reduce leakage (Note: I’m dubious about any of the decaps frequency response to supply instantaneous current at our higher frequency goals). The power saving library from (i.e., head/footer switches) appear have more functionality in that they afford a pre-trickle charge phase signal before the final charge phase, thus supplying out of the box finer control for ramp time of voltage islands.
12 vs. Synthesis Observations
13 vs. Recommendation offers a superior library in terms of performance, functionality, power, and integration. We saw no area penalty despite the difference in cell pitch. This was shown through a systematic comparison of individual library elements and as on synthesized representative blocks where implementation (with all things being equal, e.g., wire load model, constraints, etc.) on average outperformed by 3%-5%. The 9 track library gives us good power savings and reasonable performance that should meet RAPID targets. The high performance library (12 track) could be used in functional units requiring higher performance (at the cost of power and area).