Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workshop in Nihzny Novgorod State University Activity Report

Similar presentations


Presentation on theme: "Workshop in Nihzny Novgorod State University Activity Report"— Presentation transcript:

1 Workshop in Nihzny Novgorod State University Activity Report
Alexey Iliasov ( ) Kyrgyz Russian Slavic University

2 Goals of the project Research:
- implementation approaches - applicability - real-life applications targeting Implement: - simple profiler - analysis tool

3 Implementation Approaches
levels of abstraction - hardware level - machine instructions level - assembly language level - compiler level - source code level - library level

4 GNU Family Compilers - supports many languages supports many targets provides lots of optimisations techniques open source available under the terms of the GPL

5 GNU Family Compilers machine independent ports exist for more then 30 platforms high code generation quality intensive optimisation RTL - Register Transfer Language reusability ,000 lines of language and platform independent routines.

6 GNU Family Compilers weird internal structure written in mix of C and C++ modularity problems lack of good documentation

7 GCC infrastructure 25 optimization passes + assembler generation
source parser 25 optimization passes + assembler generation tree optimisation target back end RTL debug info language front-end binary

8 based on tree transformation
Mudflap C/C++ bounds checker based on tree transformation instruments program to detect memory access errors tracks call to many library functions provides replacements for common C library functions

9 memory profiler for GCC
Mudzzi memory profiler for GCC based on mudflap approach development considerations high performance language independent large-scale applications minimization of inlined code multi-threading support online or post-mortem analysis

10 memory profiler for GCC
Mudzzi memory profiler for GCC tracked events read/write memory accesses object declarations object destructions (for stack-frame objects) calls to malloc, calloc, realloc, mmap and free timing

11 Mudzzi two record types: normal prefix record
records format two record types: normal prefix record length prefixed prefix length record Memory Read/Write record: record type: 32 bits access address : 32 bits RTDSC cpu tick value : 64 bits source line number : 32 bits base pointer address : 32 bits size of accessed region : 32 bits coded source file an function name : 32 bits

12 Mudzzi code transformation original instrumented
void foo() { int a = 3; mpf_vardecl(&a, sizeof(int), 0, “a”, .., ..); int b[100]; mpf_vardecl(b, sizeof(int)*100, 0, “b”, .., ..); b[a] = 10; mpf_add(b+a, a, b, 1, .., ..); mpf_varundecl(a, .., ..); mpf_varundecl(b, .., ..); return; } void foo() { int a = 3; int b[100]; b[a] = 10; return; }

13 profiled code performance ~20% of original
Mudzzi profiled code performance ~20% of original

14 dump file size problem: grows very fast
Mudzzi dump file size problem: grows very fast

15 Visualization and analysis tool for memory profiler

16 features overview 1.Visualization of memory profiler dump
2.Cycles detection 3.Array access analysis inside detected cycles 4.Reuse distance calculation for arrays 5.Cache hit/miss rate, analysis and explanations

17 address/time diagram example array access pattern
addresses time by rows by columns

18 address/time diagram array access pattern

19 cache config and report

20 Blocked Matrix Multiply cache interference
void BlkMatrixMultiply (etype *X, etype *Y, etype *Z, int N, int B) { int w, q, i, j, k; etype r; for (w = 0; w < N; w += b) for (q = 0; q < N; q += b) for (i = 0; i < N; i++) for (k = w; k < MIN (w + b, N); k++) { r = *(X + i * N + k); for (j = q; j < MIN (q + b, N); j++) *(Z + i * N + j) += *(Y + k * N + j) * r; } where N - matrix size, B - block size we use N = 128, B = 32 and arrays are a[N][N], b[N][N], c[N][N]

21 Blocked Matrix Multiply cache interference
Full view of cache utilization report

22 Blocked Matrix Multiply cache interference
VARIABLE `a' hit rate:81% Replacement causes: `b' (0xbffe78d0:49152) replacements (62%) `c' (0xbffdb8d0:49152) - 57 replacements (3%) interference with b self interference VARIABLE `b' hit rate:81% Replacement causes: `a' (0xbfff38d0:49152) replacements (1%) `c' (0xbffdb8d0:49152) replacements (3%) VARIABLE `c' hit rate:99% Replacement causes: `a' (0xbfff38d0:49152) replacements (7%) `b' (0xbffe78d0:49152) replacements (91%)

23 number of distinct object references between two reuses
Reuse distance number of distinct object references between two reuses RD = 4 (e, c, a, d) a d f e c e a c a a c e d e a c d f a e a c RD = 4 (d, f, e, c) - not a time but address distance measure - closely related to hit rate for LRU/FIFO caches - leads to an effective and easy to apply optimisation

24 finding groups of variables commonly used together
Clustering finding groups of variables commonly used together a b c d e f a d f e c e a c a a c e d e a c d f a e a c a 5 1 4 1 b a-c b d e f a-c c 1 3.5 0.5 1 3 b d 2 1 d e 2 1 1 e f 1 f

25 profiler implementation (as GCC module)
Results of the project profiler implementation (as GCC module) Benefits: - good analysis capabilities and binding to sources - good performance - ease of use Problems: - ineffective (full code coverage) - part of another program

26 Results of the project applicability
- instrumentation effectively works for large-scale applications - reasonable performance penalty - platform/OS independent Problems: - lack of remote analysis - GCC-centric

27 Results of the project analysis tool - visual diagrams
- cache analysis - binding to source-level - flexible Problems: - poor representation for long-running large applications - too few analysis tools - some tests/tools stuck on large dump files

28 profiling the profiler
Results of the project profiling the profiler

29 Results of the project glance at future
- consider DIOTA as instrumentation basis - implement remote analysis - multiple specific profilers within one analysis tool - add support for HT/SMP architectures

30 That's all Thank you! Iliasov Alexey Kyrgyz Russian Slavic University Kyrgyzstan


Download ppt "Workshop in Nihzny Novgorod State University Activity Report"

Similar presentations


Ads by Google