Presentation is loading. Please wait.

Presentation is loading. Please wait.

DCO1 Performance Measurement and Improvement Lecture 7.

Similar presentations


Presentation on theme: "DCO1 Performance Measurement and Improvement Lecture 7."— Presentation transcript:

1 DCO1 Performance Measurement and Improvement Lecture 7

2 2 Practical Hints Give you the practical methods to enhance the programming performance Hidden Trouble Fast Allocation and Free

3 3 Hidden Trouble First look at the memory allocation, malloc() malloc in printf malloc Strings Malloc, memory allocation

4 4 malloc in printf printf causes malloc to be called in the usual implementation. This can add an unexpected cost. String manipulation is expensive in general whether it is formatting text as in printf, reading ASCII text and converting to numbers, or performing string comparisons Better not to use printf() Use puts()

5 5 Malloc malloc is (or new in C++) expensive. A common solution is to use static or local variables to avoid allocating memory on the heap. Another solution is to keep a list of objects that need to be allocated often. Then, allocation is just a matter of removing an object from the list, and freeing simply inserts the object on the list Better not to use malloc()

6 6 Strings Microsoft Foundation Class (MFC) CString class allocates dynamic memory. This is great if you want to avoid managing memory yourself and you want to avoid nasty bugs due to writing data beyond the end of allocated string memory. On the other hand, if you find that memory allocation is taking significant time in an inner loop, you might want to consider allocating a fixed-length character array as local or static data.

7 7 Fast Allocation and Free to obtain faster performance is to use a large block of memory from which smaller chunks are allocated to compute some result. (heap) After the result is obtained, the entire block is freed. This is fast because: Memory is allocated simply by incrementing the "free" pointer by the number of bytes you need to allocate. There is no need to free each allocated object; you free all objects at once by freeing the entire pool at once. reason s

8 8 Coding for Speed http://www.abarnett.demon.co.uk/tutorial.html mainly from this web site http://www.abarnett.demon.co.uk/tutorial.html Array Indices Aliases Registers Integers Loop Jamming Dynamic Loop Unrolling Faster for() loops Switch Pointers Early loop breaking Misc Using array indices There are many ways to speed up the operation.

9 9 Array Indices switch ( queue ) { case 0 : letter = 'W'; break; case 1 : letter = 'S'; break; case 2 : letter = 'U'; break; } or may be if ( queue == 0 ) letter = 'W'; else if ( queue == 1 ) letter = 'S'; else letter = 'U'; An example using switch and if-else

10 10 Array Indices A quicker method is to simply use the value as an index into a character array, eg. static char *classes="WSU"; letter = classes[queue]; In this case, class[0] means W, class[1] means S and class[2] means U

11 11 Aliases (1) void func1( int *data ) { int i; for(i=0; i<10; i++) { somefunc2( *data, i); } } Not very good

12 12 Aliases – better change to this void func1( int *data ) { int i; int localdata; localdata = *data; for(i=0; i<10; i++) { somefunc2( localdata, i); } } Better way

13 13 Registers – computer is good at register allocation Use the "register" declaration whenever you can, eg. register float val; register double dval; register int ival; This will be fster

14 14 Integers Use unsigned ints instead of ints if you know the value will never be negative. Unsigned int a; is better then int a; Some processors can handle unsigned integer arithmetic considerably faster than signed eg. unsigned int i; instead of int i Integer arithmetic is faster than floating-point operation

15 15 Loop Jamming Never use two loops where one is enough: for(i=0; i<100; i++) { stuff(); } for(i=0; i<100; i++) { morestuff(); } Better combine them

16 16 Loop Jamming It would be better to do: for(i=0; i<100; i++) { stuff(); morestuff(); }

17 17 Example – three loops (0.36ms)

18 18 Example – one loop (0.31ms)

19 19 Loop Unrolling and Dynamic Loop Unrolling for(i=0; i<3; i++) { something(i); } is less efficient than something(0); something(1); something(2); It is because the code has to check and increment the value of i.

20 20 Example – two for loops (0.96ms)

21 21 Example – one for loop (0.52ms)

22 22 Faster for loop Ordinarily, you would code a simple for() loop like this: for( i=0; i<10; i++){... } i loops through the values 0,1,2,3,4,5,6,7,8,9 If you don't care about the order of the loop counter, you can do this instead: for( i=10; i--; ) {... } 10, 9, 8, 7, …….. Decrement is faster

23 23 Faster for loop The syntax is a little strange, but is perfectly legal. The same effect could also be gained by coding: for(i=10; i; i--){ …… } or (to expand it further) for(i=10; i!=0; i--){ …… }

24 24 Example – int and increment (1.51ms)

25 25 Example– unsigned int, decrement (1.29ms)

26 26 Use witch() instead of if...else... For large decisions involving if...else...else..., like this: if( val == 1) dostuff1(); else if (val == 2) dostuff2(); else if (val == 3) dostuff3(); it may be faster to use a switch: switch( val ) { case 1: dostuff1(); break; case 2: dostuff2(); break; case 3: dostuff3(); break; } Better change to case

27 27 Pointers Whenever possible, pass structures by reference ( ie. pass a pointer to the structure ) void print_data( const bigstruct *data_pointer) {...printf contents of structure... }

28 28 Early loop breaking This loop searches a list of 10000 numbers to see if there is a -99 in it. found = FALSE; for(i=0;i<10000;i++) { if( list[i] == -99 ) { found = TRUE; } } if( found ) printf("Yes, there is a -99. Hooray! \n"); This works well but searches the whole list.

29 29 Early loop breaking A better way is to abort the search when it is found. found = FALSE; for(i=0; i<10000; i++) { if( list[i] == -99 ) { found = TRUE; break; } } if( found ) printf("Yes, there is a -99. Hooray!\n");

30 30 Suggestion (1) Avoid using ++ and -- etc. within loop expressions, eg. while(n--){}, as this can sometimes be harder to optimise. Minimize the use of global variables. Declare anything within a file (external to functions) as static, unless it is intended to be global. Use word-size variables if you can, as the machine can work with these better ( instead of char, short, double, bitfields etc. ).

31 31 Suggestion (2) Don't use recursion. Recursion can be very elegant and neat, but creates many more function calls which can become a large overhead. Avoid the sqrt() square root function in loops - calculating square roots is very CPU intensive. Single dimension arrays are faster than multi- dimensioned arrays. (a[16] is better than a[4][4]) Compilers can often optimise a whole file - avoid splitting off closely related functions into separate files, the compiler will do better if can see both of them together (it might be able to inline the code, for example).

32 32 Example - without recursion

33 33 Example - with recursion (366 ms), I already reduced the number of recursions

34 34 Suggestion (3) Single precision maths may be faster than double precision - there is often a compiler switch for this. (float is better than double unless you really want it.) Floating point multiplication is often faster than division - use val * 0.5 instead of val / 2.0. Addition is quicker than multiplication - use val + val + val instead of val * 3 puts() is quicker than printf(), although less flexible.

35 35 Example - float (4 bytes) and double (8 bytes)

36 36 Suggestion (4) Use #defined macros instead of commonly used tiny functions - sometimes the bulk of CPU usage can be tracked down to a small external function being called thousands of times in a tight loop. Replacing it with a macro to perform the same job will remove the overhead of all those function calls, and allow the compiler to be more aggressive in it's optimisation.. Binary/unformatted file access is faster than formatted access, as the machine does not have to convert between human-readable ASCII and machine-readable binary. If you don't actually need to read the data in a file yourself, consider making it a binary file.

37 37 Summary It is better to write a simple but fast program. There are many ways to speed up the operation in programming.


Download ppt "DCO1 Performance Measurement and Improvement Lecture 7."

Similar presentations


Ads by Google