# 1 CS 410 / 510 Mastery in Programming Chapter 3 Program and Language Complexity Herbert G. Mayer, PSU CS Status 7/4/2013.

## Presentation on theme: "1 CS 410 / 510 Mastery in Programming Chapter 3 Program and Language Complexity Herbert G. Mayer, PSU CS Status 7/4/2013."— Presentation transcript:

1 CS 410 / 510 Mastery in Programming Chapter 3 Program and Language Complexity Herbert G. Mayer, PSU CS Status 7/4/2013

2 Syllabus Thoughts on Complexity Thoughts on Complexity Hard to Understand Code? Hard to Understand Code? Program Complexity Program Complexity Complex vs. Hard Complex vs. Hard Halstead Program Metrics Halstead Program Metrics McCabe Cyclomatic Number McCabe Cyclomatic Number Cyclomatic Number Samples Cyclomatic Number Samples References References

3 Thoughts on Complexity ‘Complexity’ as used in this class: Refers to the number of different paths of execution through a given program, dictated by flow of control; synonym: convoluted Or refers a degree of difficulty of expressing some algorithm via a string of symbols –i.e. the source program; synonym: hard Some hard to compute functions are easy to code and understand, once invented E.g. R. E. Tarjan’s SCC algorithm, or Newton’s square-root formula Complexity, as used here, does not mean: “intractable to compute”, such as NP-complete problems requiring too much compute power to ever terminate in human time Complexity also does not mean: “hard to understand”, as may be the case with obfuscated programming styles; or poorly written code Synonym for such a type of “complex” may be: difficult to read

4 Hard to Understand C Code? #include #include int a[ 1 ];// just to have an array to index int p( char arg ) { // p printf( "%c", arg ); return 0;// no array bounds violation! } //end p int main( ) { // main a[ p( 'a' ) ] = a[ p( 'b' ) ] = a[ p( 'b' ) ] = a[ p( 'c' ) ] = a[ p( 'd' ) ]; a[ p( 'c' ) ] = a[ p( 'd' ) ]; printf( "\n" ); return 0; } //end main

5 Hard to Understand Code? Output using PSU Unix C compiler is: a b c d Output using PSU Unix C compiler is: a b c d Is this correct? If not, what should output be? Is this correct? If not, what should output be? Is this assignment-statement rule respected in the used C++ implementation: Is this assignment-statement rule respected in the used C++ implementation: to execute the right-hand side first? Other outputs feasible, according to rules C++ or Java or C# ? Other outputs feasible, according to rules C++ or Java or C# ?

6 Hard to Understand, Not Complex #include #include #define MAX 7// 7 redundant? Discuss! int a[ MAX ] = { 0, 1, 2, 3, 4, 5, 6 }; void p() { // p for( int i = 0; i < MAX; i++ ) { printf( " a[%d] = %d\n", i, a[ i ] ); printf( " a[%d] = %d\n", i, a[ i ] ); } //end for printf( "\n" ); } //end p int main() { // main int x = 99; p(); a[ x = 3 ] = a[ x = 5 ] = x = 6; p(); } //end main

7 Hard to Understand, Not Complex a[0] = 0 a[1] = 1 a[2] = 2 a[3] = 3 a[4] = 4 a[5] = 5 a[6] = 6 a[0] = 0 a[1] = 1 a[2] = 2 a[3] = 6  a[4] = 4 a[5] = 6  a[0] = 0 a[1] = 1 a[2] = 2 a[3] = 3 a[4] = 4 a[5] = 5 a[6] = 6 a[0] = 0 a[1] = 1 a[2] = 2 a[3] = 6  a[4] = 4 a[5] = 6  a[6] = 6 a[6] = 6 x ends up being = 6 on [most] C++ run-time systems

8 Program Complexity Some computable problems are hard, NP-hard, complex, or hard-to-understand! Assuming an experienced designer and programmer: Some problems are laborious to solve; they are “complex” due to amount of work Others are hard, due to elusiveness of a solution; just try to find a better SCC!!! Yet others are not solvable; e.g. non computable functions, e.g. Halting Problem [10] What is program complexity? Is a large program complex, i.e. one with many lines of code (LOC)? More complicated code? Spaghetti code? Labels? Computable labels? Gotos? Poor naming conventions? Recursive functions? What unit-of-measure does complexity have? Time to run? Number of different paths through control-flow graph? Space for memory locations needed to run? Number of processors needed to solve computation? Number of iterations for suitable solution? E.g. number of digits for π Degree of “mental hardness” to identify a solution? E.g. in the chess game? V(G) by McCabe is a stab at a unit of complexity. But will it be universally acceptable?

9 Program Complexity Programmatic solution for “chess” is hard or complex or both? Safely: A complete and correct chess program is hard to code Yet the rules are simple and relatively few And it has been solved programmatically to the grand-master level Kasparov lost to “Deep Blue” in a Tournament in game 1 in 1996, overall competition ended up in a tie in 1997 [8] Degree of difficulty for finding a solution quantifies complexity! For example, solving Sudoku? Some problems seem not hard, yet the number of special cases renders a solution virtually intractable E.g. US tax code [9]; contains about 9,800 different sections; ~75,000 pages Could be simpler and fairer, even equally applicable to all citizens But instead is highly complex, due to “special cases” and requires experts to give definitive answers; has exceptions for individual tax payers! Numerous CS attempts to formalize complexity, unit, computability We cover 2 very briefly: Halstead’s and McCabe’s

10 Complex vs. Hard Complex is to be interpreted as “Mathematically difficult to find a correct algorithm!” E.g. find an algorithm to identify all strongly-connected components in a graph: SCC Hard is to be interpreted as “Very much work to compute the solution”, with the algorithm being not hard E.g. compute the shortest path for a Travelling Salesman’s n stopping points Might take so long that we are no longer interested in the solution Instead: use heuristic provably no worse than x times the best solution An incorrect solution, is always easy to compute An incorrect solution, is always easy to compute

11 Halstead Program Metrics Measures a specific program’s complexity Metrics developed by the late Maurice Halstead To directly quantify complexity of any given source program Solely from operators, operands used in source Halstead introduced measures in 1977 Early formal program complexity measures [1], [2], [3] Not formally derived, but postulated Halstead metrics carry an element of arbitrariness Lack scientific proof! No formal derivation of the rules!

12 Halstead Program Metrics Halstead’s metrics count operators and operands in source code of program being analyzed number of unique (distinct) operators (n1) number of unique (distinct) operands (n2) total number of operators (N1) total number of operands (N2) Number of unique operators and operands (n1 and n2) as well as the total number of operators and operands (N1 and N2) are calculated during lexical analysis of source program Other Halstead measures are derived from these 4 units but without proof or scientific derivation! intuition of developer was used as the basis for deriving the measures Halstead intended to provide formal proofs; but he died!

13 Halstead Program Metrics Operands Literals, AKA constants; e.g. 0, 1000, “hello” User defined identifiers for values, AKA symbolic constants, e.g. MAX is an operand in: #define MAX 5 Reserved keywords that denote value, e.g. NIL Declarations like #define MAX 5 less obvious Depending on language, some language-defined type specifiers are treated as operands, e.g. in C++ char, int, double

14 Halstead Program Metrics Operators Common arithmetic symbols, e.g. + - / * ^ % Other arithmetic symbols, e.g. ( and ) Symbols for boolean operations, e.g. > >= < <= != && || Symbols for all kinds of operations, including cat for concatenation in some languages Reserved keywords, e.g. or, or else, and, and then, xor Function names, e.g. add( a, 8 ), sin( 45 ), sqrt( 3 ) Reserved operations, e.g. try, catch, throw Type qualifiers, e.g. const, volatile Scope specifiers, e.g. extern, static 1

15 Halstead Program Metrics Operators that are control constructs: if (... ) plus then-clause and optional else-clause while (... ) do... for( ; ; )... catch() return... switch {... }

16 Halstead Program Metrics Program length N, vocabulary size n, program volume V: Program N is the sum of total number of operators and operands in the program analyzed: Program length N is the sum of total number of operators and operands in the program analyzed: N = N1 + N2 Vocabulary size n is the sum of the number of unique operators and operands: n = n1 + n2 Program volume V : information contents of program: V = N * log 2 n

17 Halstead Program Metrics Difficulty level D, AKA degree of error-proneness: Level of difficulty D of program is proportional to number of unique operators n1 in program And proportional to the total number of operands N2 But with scale-factors applied to both D is postulated to be: D = ( n1 / 2 ) * ( N2 / n2 ) Interestingly, total number of operators N1 is not part of the formula for the difficulty level D

18 Halstead Program Metrics Program level L: Program level L is inverse of error-proneness i.e. a low level program is more prone to errors than a corresponding high level program for the same computable function L = 1 / D

19 Halstead Program Metrics Other measures, for you to elaborate in your paper Effort to implement Time to implement Number of bugs delivered Etc.

20 Cyclomatic Number Goal of McCabe’s Cyclomatic Numbers: To have a measure of source program complexity To manage complexity, rather than dealing with an unknown See [4], [6] Builds on: Graph theory E.g. [7] Berge: “Graphs and Hypergraphs” Fundamental units: Graph G –not necessarily connected! Number of edges: e Number of nodes: n Number of connected components: p i.e. if ( p > 1 ) then G is not connected

21 Cyclomatic Number V Cyclomatic number V of a graph G is called V(G) If: e = number of edges n = number of nodes, AKA vertices in other literature p = number of connected components then: V(G) = e – n + 2 * p

22 Cyclomatic Number Samples Sequence of 2 statements e = 1 n = 2 p = 1 V(G) = 1 – 2 + 2 * 1 = 1 If Statement with Then- and Else- e = 4 n = 4 p = 1 V(G) = 4 – 4 + 2 * 1 = 2 Sequence of 4 statements e = 3 n = 4 p = 1 V(G) = 3 – 4 + 2 * 1 = 1

23 Cyclomatic Number of While While Loop e = 3 n = 3 p = 1 V(G) = 3 - 3 + 2 * 1 = 2

24 Cyclomatic Number of Program Multiple-Module program with no cross-module vertices Main Program= M Module A= A() Module B= B() V(G) = V( M U A U B ) = V(M) + V(A) + V(B) M: A: B: V(M) = 3-2+2 = 1 V(A) = 4-4+2 = 2 V(B) = 6-5+2 = 3 V(G) = 12 – 12 + 2*3 = 6

25 References  Halstead metrics: http://www.verifysoft.com/en_halstead_metrics.html  Halstead’s book: Maurice Halstead, “Elements of Software Science”, Elsevier, 1977, ISBN 0444002057  Detail on Halstead: http://www.horst-zuse.homepage.t- online.de/halstead.html  Wiki page on Cyclomatic numbers: http://en.wikipedia.org/wiki/Cyclomatic_complexity  Program complexity: http://www.acis.pamplin.vt.edu/faculty/tegarden/wrk-pap/DSS.PDF  Thomas J. McCabe, “A Complexity Measure”, IEEE Transactions on SWE, Viol. SE-2, No. 4, December 1976  C. Berge: “Graphs and Hypergraphs”, North-Holland, Amsterdam 1973  Deep Blue Info: http://www.research.ibm.com/deepblue/  Tax code info: http://www.fourmilab.ch/ustax/ustax.html  Halting Problem: http://www.comp.nus.edu.sg/~cs5234/FAQ/halt.html  Robert E. Tarjan: "Depth-First Search and Linear Graph Algorithms". SIAM J. Computing, Vol. 1, No. 2, June 1972

Similar presentations