Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Return of Synthetic Benchmarks

Similar presentations


Presentation on theme: "The Return of Synthetic Benchmarks"— Presentation transcript:

1 The Return of Synthetic Benchmarks
Ajay M. Joshi (UT Austin) Lieven Eeckhout (Ghent University) Lizy K. John (UT Austin) Laboratory of Computer Architecture Department of Electrical & Computer Engineering The University of Texas at Austin January 28, 2008 Add slide numbers

2 Outline The Need for Synthetic Benchmarks
BenchMaker Framework for Benchmark Synthesis Workload Characteristics Used in Synthesis Synthetic Benchmark Construction Evaluation of BenchMaker Applications Summary

3 e.g. Dhrystone, Whetstone
Benchmark Spectrum Complete Application Code Application Suites e.g. SPEC CPU Kernel Codes e.g. Livermore Loops Synthetic Benchmarks e.g. Dhrystone, Whetstone Microbenchmarks e.g. STREAM Toy Benchmarks e.g. Heap sort Less Development Effort More Scalable More Maintainable Less Representative More Development Effort Less Scalable Less Maintainable More Representative

4 Focus on Simulation Time Reduction
Statistical Sampling [Conte et al., ICCD’96 ] [Wunderlich et al., ISCA’03] Representative Sampling [Sherwood et al., ASPLOS’02] Reduced Input Set [ KleinOsowski, CAN’04] Statistical Simulation & Synthetic Workloads [Oskin et al., ISCA’00] [ Eeckhout et al., ISPASS’00] [Nussbaum et al., PACT’01] [Bell et al., ICS’05] Benchmark Subsetting [Eeckhout et al., PACT’02] [Vandierendonck et al., CAECW’04] [Phansalkar et al., ISPASS’05] [Eeckhout et al. IISWC’05] Analytical Modeling [Noonburg et al., MICRO’94] [Karkhanis et al., ISCA’04] Speedup Simulation [Schnarr et al., ASPLOS’98] [Loh et al., SIGMETRICS’01]

5 Motivation : Benchmarking Challenges
Using Real-World Applications as Benchmarks Proprietary Nature of Real-World Applications Single-Point Performance Characterization Application Benchmarks are Rigid Applications Evolve Faster than Benchmarks Benchmark Suites are Costly to Develop, Maintain, and Upgrade Studying Commercial Workload Performance Early Design Stage Power/Performance Studies Usefulness of Synthetic Benchmarks Beyond Simulation Time Reduction

6 Resurgence of Synthetic Benchmarks…..
IEEE Computer, August 2003

7 Outline The Need for Synthetic Benchmarks
BenchMaker Framework for Benchmark Synthesis Workload Characteristics Used in Synthesis Synthetic Benchmark Construction Evaluation of BenchMaker Applications Summary

8 Workload Synthesis: Central Idea
Just 40 workload characteristics

9 Modeling Real-World Applications
Microarchitecture-Independent Workload Profiling Modeling Workload Attributes into Synthetic Workload Experiment Environment Real World Proprietary Workload Workload Profiler Binary Instrumentation OR Simulation Real Hardware Workload Synthesizer Synthetic Benchmark Clone Workload Profile = Attributes + Distribution Of Attribute Values Execution Driven Simulator

10 Outline The Need for Synthetic Benchmarks
BenchMaker Framework for Benchmark Synthesis Workload Characteristics Used in Synthesis Synthetic Benchmark Construction Evaluation of BenchMaker Applications Summary

11 Workload Characteristics as ‘Knobs’
Category Num. Characteristic instruction mix 10 percentage of integer short latency percentage of integer long latency percentage of floating-point short latency percentage of floating-point long latency percentage of integer load percentage of integer store percentage of floating-point load percentage of floating-point store percentage of branches Instruction-level parallelism 8 register-dependency-distance – 8 distributions for register dependencies. Register dependency distance equal to 1 instruction, and the percentage of dependency dependencies that have a distance of up to 2, 4, 6, 8, 16, 32, and greater than 32 instructions. data locality 1 data footprint distribution of local stride values instruction locality instruction footprint branch predictability distribution of branch transition rate

12 Capturing The Essence of Workloads
Attributes to capture inherent workload behavior – Data Locality: Dominant strides of static Load/Store – Control Flow Predictability: Branch transition rate Modeling Locality & Control Flow Predictability – Data Locality of Integer, Scientific, and Embedded Workloads effectively modeled using circular streams – Replicating transition-rate of static branches

13 Modeling Data Access Pattern
Identify streams of data references A Stream? – Sequence of memory addresses in an arithmetic progression – Elements of arrays A, B, and C form 3 streams for( ii = 0; ii < N; ii ++) A [ii] = B [ii] C [ii] 200, 204, , 324, , 408, Issuing Sequence : 320, 404, 200, 324, 408, 204 …. Streams are interleaved and may contain noise 4, 8, 12, 16, 1, 3, 20, 24, 5, 7, 2, 9, 11, 28 …

14 Extracting Streams Reference pattern of static Load / Store Instructions – PC-correlated spatial locality - Dependence on address referenced by nearby Ld / St - Programs with pointer chasing codes – PC-correlated temporal locality - Dependence on previous address generated by same Ld / St - Programs with multidimensional arrays Could static Load / Store instructions be natural sources of streams ? Profile every static Load / Store instruction – Number of different strides with which it accesses data

15 Modeling Instruction Level Parallelism
Dependency Distance ADD R1, R3,R4 MUL R5,R3,R2 ADD R5,R3,R6 LD R4, (R1) SUB R8,R2,R1 Read After Write Dependency Distance = 3 Measure Distribution of Dependency Distances Upto 1, Upto 2, Upto 4, Upto 8, Upto 16, Upto 32, >32

16 Modeling Control Flow Predictability
Capture behavior of easy and difficult to predict branches Inherent program feature that captures branch behavior Transition Rate [ Haungs et al. HPCA’00 ] # of Taken-Not Taken transitions / # of times executed Branches with low transition-rate (easier to predict) TTTTTTTTTN, NNNNNNNNNT Branches with high transition-rate (easier to predict) TNTNTNTNTN Branches with moderate transition-rate (tougher to predict)

17 Outline The Need for Synthetic Benchmarks
BenchMaker Framework for Benchmark Synthesis Workload Characteristics Used in Synthesis Synthetic Benchmark Construction Evaluation of BenchMaker Applications Summary

18 Workload Synthesis (1) Workload Profile 1 Big Loop
Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities A B A 1 Big Loop D A B BR 0.8 0.2 D B C Synthetic Clone Generation A BR BR C 1.0 1.0 D D A BR 0.1 B 0.9 D Workload Profile

19 Workload Synthesis (2) Workload Profile Memory Access Model (Strides)
Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities Memory Access Model (Strides) A B A 1 Big Loop D A B BR 0.8 0.2 D B C Synthetic Clone Generation A BR BR C 1.0 1.0 D D A BR 0.1 B 0.9 D Workload Profile

20 Workload Synthesis (3) Workload Profile Memory Access Model (Strides)
Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities Memory Access Model (Strides) A B A 1 Big Loop D A B BR 0.8 0.2 D B C Synthetic Clone Generation A BR BR C 1.0 1.0 D D A Branching Model – Based on Transition Rate BR 0.1 B 0.9 D Workload Profile

21 Workload Synthesis (4) Workload Profile Memory Access Model (Strides)
Instruction Mix Register Dependency Distance Stride Pattern of Load/Store Branch Transition Rate Branch Transition Probabilities C A B D BR 0.8 0.2 1.0 0.9 0.1 Memory Access Model (Strides) A B 1 Big Loop D A B D Synthetic Clone Generation A C D A Branching Model – Based on Transition Rate B D Workload Profile Register Assignment C code with asm & volatile constructs

22 Outline The Need for Synthetic Benchmarks
BenchMaker Framework for Benchmark Synthesis Workload Characteristics Used in Synthesis Synthetic Benchmark Construction Evaluation of BenchMaker Applications Summary

23 Evaluation of BenchMaker
SPEC CPU2000, SPECjbb2005, and DBT2 workloads Validated Sim-Alpha Performance Model of Alpha 21264 Benchmark Input SimPoint(s) SPEC CPU2000 Integer bzip2 graphic 553 crafty ref 774 eon rushmeier 403 gcc 166.i 389 gzip mcf perlbmk perfect-ref 5 twolf 1066 vortex lendian1 271 vpr route 476 expr 8, 24, 47, 51, 56, 73, 87, 99 SPEC CPU95 Integer 0, 3,5,6,7,8,9,10,12

24 Performance Correlation
Trade Accuracy for Flexibility – Average Error of 11%

25 Energy/Power Correlation
Average Error of 13%

26 Outline The Need for Synthetic Benchmarks
BenchMaker Framework for Benchmark Synthesis Workload Characteristics Used in Synthesis Synthetic Benchmark Construction Evaluation of BenchMaker Applications Summary

27 Altering Individual Program Characteristics

28 Interaction of Program Characteristics

29 Modeling Impact of Benchmark Drift
Increase in Code Footprint (hypothetical) Increase in Data Footprint from SPEC CPU95 to SPEC CPU2000 for gcc (Model with 7% accuracy)

30 Summary Synthetic Benchmarks to Address Benchmarking Challenges
Constructing Synthetic Benchmarks from Hardware-Independent Characteristics Applications of Synthetic Benchmarks - Altering Program Characteristics - Studying Interaction of Program Characteristics - Modeling Benchmark Drift

31 Questions? Ajay’s


Download ppt "The Return of Synthetic Benchmarks"

Similar presentations


Ads by Google