Download presentation
Presentation is loading. Please wait.
1
A Billion Cycles a Day: Industrial Verification Matthew Heath Presentation to Synthesis & Verification Class May 8, 2003 Based on “Validating the Intel Pentium 4 Microprocessor” by Bob Bentley, DAC 2001
2
How do you verify a design with... 42 million transistors 1 million lines of RTL code 600 – 1000 people working on it A 3-year design time Daily design changes
3
How do you verify a design which has bugs like this?? The FMUL instruction, when the rounding mode is set to “round up”, incorrectly sets the sticky bit when the source operands are: src1[67:0] = X*2 i+15 + 1*2 i src2[67:0] = Y*2 j+15 + 1*2 j where i+j = 54 and {X,Y} are integers
4
And the answer is... Hire 70+ validation engineers Buy several thousand compute servers Write 12,000 validation tests Run up to 1 billion simulation cycles per day for 200 days Check 2,750,000 manually-defined properties Find, diagnose, track, and resolve 7,855 bugs Apply formal verification with 10,000 proofs to the instruction decoder and FP units This found that obscure FMUL bug!
5
We know why validation is hard for tools. Why is it hard for people who run them? To meet an aggressive tapeout schedule, design and validation must occur in parallel without one blocking the other. Validation starts before the design is done Design changes occur while validation tests are running Both design and validation must continue in the presence of known, unfixed bugs
6
The design team 300 designers write RTL code Refer to architectural spec, textbooks, research papers, conversations Start with basic functionality and progressively add features according to project staging plan Do simple self-checks along the way
7
The validation team 100 validators write RTL tests Refer to same sources as designers, plus the RTL implementation itself Write functional tests to exercise features as they’re implemented Run tests on RTL simulator Diagnose failures File bug reports in central database
8
The management Collect and analyze data Pass/fail status of tests Bug database statistics (counts, priority, age, discovery rate, fix rate, etc.) RTL feature implementation progress Compare trends with project schedule Respond if necessary Re-allocate resources to high-risk areas Prioritize work
9
SRTL = “Structural” RTL Boolean equations; no behavioral syntax State-accurate RTL state maps directly to schematic state High-level constructs supported Macros, constants, loops, vectors Design hierarchy Full-chip has 6 clusters Each cluster has several units Each unit has tens of functional blocks Each block has O(10 4 ) transistors Each designer owns several functional blocks
10
SRTL models Cluster and full-chip level Full-chip models consume ~1GB of disk space Compiled, executable SRTL code Source code Test environments Include emulation of external logic Direct control over interface signals Pre-defined sets of signals commonly selected for tracing during test debug Library of useful test fragments
11
Most design work at cluster level Decouples cluster and full-chip validation Designers “graft” to latest cluster models Check-out and edit selected source files Incremental model build Run validation tests Revision control system Designers check-in edited source files Log messages include change descriptions, author, timestamp
12
Cluster model release process Designers periodically turn-in selected checked-in versions of source files Coordinated turn-ins sometimes necessary Cluster model builders process turn-ins Merge changes from different versions of the same source file included in multiple turn-ins Compile an executable cluster SRTL model Run tests provided by the validators Report test failures to validators and designers for debug Acceptable models released to design team for future grafts
13
Full-chip model release process Same process, different hierarchy Cluster model builders don designer’s hat Graft to full-chip model Edit based on changes to recent cluster models Incremental full-chip model build Run full-chip validation tests Debug failures, full-chip turn-in Now full-chip model builders take over... Process turn-ins from all clusters Run full-chip validation tests again! Release full-chip models to design team
14
Netbatch 10 9 simulation cycles / day = 10 Hz * 10 5 sec/day * 10 3 computers Netbatch manages compute server workload For a given SRTL model and set of tests, create a job file and send it to netbatch Each sub-team has a netbatch allocation Jobs exceeding allocation enter wait queue Wait times of 24 hrs + not uncommon Test results Pass/fail statistics Failure time and meaningful error message Traces of user-selected system state
15
Efficiency improvements A SRTL change made by a designer... Appears in a cluster model 1 week later Appears in a full-chip model 2 weeks later Validators find bugs in released models which the designer has already fixed “Onion peeling” vs. “whack-a-mole” debug Temporarily disabling failing properties Releasing models which fail some tests System state capture and restore
16
Central bug database Released model version Failing validation test & symptoms Root cause Requested design change Priority Log of discussion among designers, validators, and managers Status / disposition New, ETA, test fixed, design fixed (& version), validated, dropped
17
Bug root causes
18
Schematic formal verification Use formal techniques because schematic simulation takes too long Schematic design starts long before SRTL design is done Bottom-up Verify SRTL macros vs. library cells first Black-box macrocells & verify block Because SRTL is state-accurate, verification is combinational only!
19
One SRTL state may map to multiple functionally equivalent schem states Z = X & Y MSFF (Z, W, CLK) X QD Y W1 QD Z CLK W2 X QD Y W1 QD Z1 CLK W2 Z2 X QD Y CLK W Z
20
Retiming must be back-annotated into SRTL Exception: Inverters Z = X & Y MSFF (Z, W, CLK) X QD Y W QD Z1 Z2 QD CLK X Z MSFF (X, Y, CLK) Z = ~Y Y
21
Conclusion Efficient verification of large-scale designs is a daunting management challenge Design and validation are concurrent, not iterative Possible with adequate resources and powerful tools to use the resources efficiently Methodology constraints keep the problem tractable Clear communication among team Careful documentation Progress tracking is key to staying on schedule Motto: “If it hasn’t been verified, it doesn’t work.”
22
How NOT to do verification... Arnold was unhappily aware that the complete Jurassic Park program contained more than half a million lines of code, most of it undocumented, without explanation... “What are you doing, John?” “Checking the code.” “By inspection? That’ll take forever.” - Michael Crichton, Jurassic Park
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.