Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Sampling-based Program Locality Approximation Yutao Zhong, Wentao Chang Department of Computer Science George Mason University June 8th,2008.

Similar presentations


Presentation on theme: "1 Sampling-based Program Locality Approximation Yutao Zhong, Wentao Chang Department of Computer Science George Mason University June 8th,2008."— Presentation transcript:

1 1 Sampling-based Program Locality Approximation Yutao Zhong, Wentao Chang Department of Computer Science George Mason University June 8th,2008

2 2 Outline Background information Motivation Our sampling approach Experimental results

3 3 Reuse distance and reuse signature a b c a a c b Reuse distance: the number of distinct data elements accessed between two consecutive uses of the same element Reuse signature: a histogram of reuse distances demonstrating the distribution of reuse distances over different lengths 2 2 Starting Point Ending Point

4 4 Reuse signature application Relationship to cache behavior : Capacity miss <= reuse distance ≥ cache size Reduce reuse distance => improve cache effectiveness Current applications : Predict cache miss rate [Zhong+03][Marin & Mellor-Crummey 04] [Fang+05][Zhong+07] Reorganize data [Zhong+04] Provide caching hint [Beyls & D’Hollander 02] Evaluate program optimizations [Beyls & D’Hollander 01] [Ding 00]

5 5 Reuse distance measurement Access Time Table Access Trace Distance Histogram Get Accessed Memory Address Search Update Address Search, Count Update Last Record distance Distance ① Large space and a long counting time required to store traces and count memory access ② Enormous efforts for memory-intensive program Data Structure: a c a b b a Starting Point Ending Point 1

6 6 Motivation Sampling is generally effective to reduce the overhead of program behavior profiling We are devoted to balance efficiency and accuracy Sample only 1% memory accesses Improve measurement speed by 7.5 times in average Achieve over 99% accuracy

7 7 Sampling algorithms Utilize common structure of bursty tracing [Hirzel & Chilimbi 01] Sampling rate r =|I s |/(|I s | +|I H |) Naïve sampling Turn off profiling during hibernating intervals Non guarantee of accuracy

8 8 Naive sampling.. c a b c a c a b c a c a b c d a.... Memory access trace: IHIH ISIS Naïve sampling: IHIH ISIS ①②③④ 1 Inaccurate measurement ⑤ 3

9 9 Biased sampling Ignore datum that has been referenced within the current hibernating period Measured distance always larger than or equal to actual distance Probability of being sampled not uniform

10 10 Biased sampling.. c a b c a f a b c a c a b f d a.... Memory access trace: IHIH ISIS Biased sampling: IHIH ISIS ①②③④ ⑤

11 11 History-preserved representative sampling Add an additional tag for each address in access trace Mark references within a sampling period as sampled in the tag Reuse will only be sampled when starting point marked sampled

12 12 History-preserved representative sampling.. c a b c a f a b c a c a b f d a.... Memory access trace: IHIH ISIS History-preserved representative sampling: IHIH ISIS ①②③④ ⑤

13 13 Further improvements Simplifying maintenance in hibernating intervals Reference trace implementation: splay tree [Ding & Zhong 03] In sampling period, full tree maintenance In hibernating period, instead of a new leaf node for each access, we construct a single node for each hibernating period with a counter of the number of distinct accesses Fast sample tag marking and checking To save space cost, we fix the length of sampling and hibernating period, avoid additional tag

14 14 Experiments Benchmarks from SPEC 2006, Olden, Chaos: Floating point programs: CactusADM, Milc, Soplex, Apsi, MolDyn Integer programs: Bzip2,Gcc, Libquatum, Perimeter, TSP Instrumentation tool: Valgrind 3.2.3 Sampling rate : 1% We run each individual benchmark with 3 to 6 different inputs Repeat three time for each input

15 15 Experiments cont’d Comparison of accuracy and efficiency Ding and Zhong ’s approximation method [Ding & Zhong 03] Time distance measurement [Shen+07] Implementation of four algorithms: Naive sampling, biased sampling, basic and optimized representative sampling

16 16 Accuracy

17 17 Efficiency Sampling even outperforms the lower bound :time distance measurement Generally, speedup is less when the input size is small

18 18 Efficiency Speedup of basic representative sampling : around 4-5 times for most cases Speedup of optimized representative sampling: around 7-10 for most cases, up to 33 times geometric mean is 7.5 Sampling rate effect (TSP):

19 19 Related work Reuse signature collection [Mattson+70] [Bennett & Kruskal 75] [Olken81] [Kim+91] [Sugumar & Abraham 93] [Almasi+02] [Ding & Zhong 03] [Shen+07] Selective monitoring Time sampling [Zagha+96] [Anderson+97] [Burrows+00][Whaley 00] [Arnold & Sweeney 00] [Arnold & Ryder 01] [Hirzel & Chilimbi 01] [Chilimbi & Hirzel 02] [Itzkowitz+03] [Arnold & Grove 05] Data sampling [Larus 90] [Ding & Zhong 02] [Zhao+07] Uses of efficient locality analysis [Huang & Shen 96] [Li+96] [Ding 2000] [Beyls & D’ Hollander 01] [Almasi+02] [Beyls & D’ Hollander 02] [Zhong+04] [Marin & Mellor-Crummey 04] [Fang+05] [Zhong+07]

20 20 Future work Dynamically adjust sampling/hibernating lengths Store references in temporary buffer and then process them in batch Combine time sampling with data sampling

21 21 Thank you! Questions?


Download ppt "1 Sampling-based Program Locality Approximation Yutao Zhong, Wentao Chang Department of Computer Science George Mason University June 8th,2008."

Similar presentations


Ads by Google