Presentation is loading. Please wait.

Presentation is loading. Please wait.

DATA AND FILE STRUCTURE USING C MCA110 M K Pachariya Id. Department of Computer Application, Galgotias.

Similar presentations


Presentation on theme: "DATA AND FILE STRUCTURE USING C MCA110 M K Pachariya Id. Department of Computer Application, Galgotias."— Presentation transcript:

1 DATA AND FILE STRUCTURE USING C MCA110 M K Pachariya Email Id. manoj.pachariya@galgotiasuniversity.edu.in Department of Computer Application, Galgotias University, Greater Noida www.sites.google.com/a/galgotiasuniversity.edu.in/manojkumarpachariya L T P C 3 0 2 4

2 Text Books 1.Data Structures, by Tannenbaum, (PHI). 2.Fundamentals of Data structures, by Horowitz and Sahani (Galgotia publications). Reference Book 1.Data Structures : By Seymour Lipschutz, Tata Mcgraw- Hill Publication. 2.Data Structure and algorithm using C :By R.S.Salaria- Khanna Publication. 3.R. L. Kruse, B. P. Leung, C. L. Tondo, “Data Structures and program design in C”, 4.Algorithms + Data structures = Programs: Wirth,Niclaus, Prentice Hall International,1976. DATA AND FILE STRUCTURE USING C MCA110

3 Topics to covered Arrays and Abstract Data Types Abstract Data Types,Big ‘O’ notations, Time and space complexity of algorithms., Elementary data structures and their applications. Array Definition, Single and Multidimensional Arrays, application of arrays, String Operation, Ordered List, Sparse Matrices, Lower and Upper Triangular matrices, and tridiagonal matrices Linked Lists Singly Linked Lists – Circular Linked Lists – Doubly Linked Lists – Implementation of Lists, Polynomial representation and addition, Generalized linked list, Header Linked lists..Stacks: Array Representation and Implementation of stack, Operations on Stacks: Push & Pop, Array Representation of Stack, Linked Representation of Stack, Applications of stack: Conversion of Infix to Prefix and Postfix Expressions, Parenthesis Checker and Evaluation of postfix expression using stack.

4 CS 1034 Introduction (Outline) The Software Development Process Introduction to Data Structures Performance Analysis: the Big Oh. Abstract Data Types

5 CS 1035 The Software Development Process

6 CS 1036 Software Development Requirement analysis, leading to a specification of the problem Design of a solution Implementation of the solution (coding) Analysis of the solution Testing, debugging and integration Maintenance and evolution of the system.

7 CS 1037 Specification of a problem A precise statement/description of the problem. It involves describing the input, the expected output, and the relationship between the input and output. This is often done through preconditions and postconditions.

8 CS 1038 Design Formulation of a method, that is, of a sequence of steps, to solve the problem. The design “language” can be pseudo-code, flowcharts, natural language, any combinations of those, etc. A design so expressed is called an algorithm(s). A good design approach is a top-down design where the problem is decomposed into smaller, simpler pieces, where each piece is designed into a module.

9 CS 1039 Implementation Development of actual C/C++ code that will carry out the design and solve the problem. The design and implementation of data structures, abstract data types, and classes, are often a major part of design implementation.

10 CS 10310 Implementation (Good Principles) Code Re-use –Re-use of other people’s software –Write your software in a way that makes it (re)usable by others Hiding of implementation details: emphasis on the interface. Hiding is also called data encapsulation Data structures are a prime instance of data encapsulation and code re-use

11 CS 10311 Analysis of the Solution Estimation of how much time and memory an algorithm takes. The purpose is twofold: –to get a ballpark figure of the speed and memory requirements to see if they meet the target –to compare competing designs and thus choose the best before any further investment in the application (implementation, testing, etc.)

12 CS 10312 Testing and Debugging Testing a program for syntactical correctness (no compiler errors) Testing a program for semantic correctness, that is, checking if the program gives the correct output. This is done by –having sample input data and corresponding, known output data –running the programs against the sample input –comparing the program output to the known output –in case there is no match, modify the code to achieve a perfect match. One important tip for thorough testing: Fully exercise the code, that is, make sure each line of your code is executed.

13 CS 10313 Integration Gluing all the pieces (modules) together to create a cohesive whole system.

14 CS 10314 Maintenance and Evolution of a System Ongoing, on-the-job modifications and updates of the programs.

15 Introduction to Data Structures

16 Data Structures A data structure is a scheme for organizing data in the memory of a computer. Some of the more commonly used data structures include lists, arrays, stacks, queues, heaps, trees, and graphs. Binary Tree

17 Data Structures The way in which the data is organized affects the performance of a program for different tasks. Computer programmers decide which data structures to use based on the nature of the data and the processes that need to be performed on that data. Binary Tree

18 CS 10318 Data Structures A data structure is a user-defined abstract data type Examples: –Complex numbers: with operations +, -, /, *, magnitude, angle, etc. –Stack: with operations push, pop, peek, isempty –Queue: enqueue, dequeue, isempty … –Binary Search Tree: insert, delete, search. –Heap: insert, min, delete-min.

19 CS 10319 Data Structure Design Specification –A set of data –Specifications for a number of operations to be performed on the data Design –A lay-out organization of the data –Algorithms for the operations Goals of Design: fast operations

20 CS 10320 Implementation of a Data Structure Representation of the data using built-in data types of the programming language (such as int, double, char, strings, arrays, structs, classes, pointers, etc.) Language implementation (code) of the algorithms for the operations

21 CS 10321 Object-Oriented Programming (OOP) And Data Structures When implementing a data structure in non-OOP languages such as C, the data representation and the operations are separate In OOP languages such as C++, both the data representation and the operations are aggregated together into what is called objects The data type of such objects are called classes. Classes are blue prints, objects are instances.

22 Concept of Data Structure Suitable Representation Ease of Retrieval Operations allowed Performance of program depends:  Choice of right data structure for given problem. (60 Students Merit Example)  Design of a suitable Algorithm to work on the chosen data structure.

23 Types of Data Structure Linear Data Structure ( Sequence of Data items) Exp-Queue, stack, chain Non-Linear Data Structure (represent objects which are not in sequence but are distributed in plane Exp- Tree, Graph, Two dimensional array

24 Example: A Queue A queue is an example of commonly used simple data structure. A queue has beginning and end, called the front and back of the queue. Data enters the queue at one end and leaves at the other. Because of this, data exits the queue in the same order in which it enters the queue, like people in a checkout line at a supermarket.

25 Example: A Binary Tree A binary tree is another commonly used data structure. It is organized like an upside down tree. Each spot on the tree, called a node, holds an item of data along with a left pointer and a right pointer. Binary Tree

26 Example: A Binary Tree The pointers are lined up so that the structure forms the upside down tree, with a single node at the top, called the root node, and branches increasing on the left and right as you go down the tree. Binary Tree

27 Choosing Data Structures By comparing the queue with the binary tree, you can see how the structure of the data affects what can be done efficiently with the data.

28 Choosing Data Structures A queue is a good data structure to use for storing things that need to be kept in order, such as a set of documents waiting to be printed on a network printer..

29 Choosing Data Structures The jobs will be printed in the order in which they are received. Most network print servers maintain such a print queue..

30 Choosing Data Structures A binary tree is a good data structure to use for searching sorted data. The middle item from the list is stored in the root node, with lesser items to the left and greater items to the right.

31 Choosing Data Structures A search begins at the root. The computer either find the data, or moves left or right, depending on the value for which you are searching. Each move down the tree cuts the remaining data in half.

32 Choosing Data Structures Items can be located very quickly in a tree. Telephone directory assistance information is stored in a tree, so that a name and phone number can be found quickly.

33 Choosing Data Structures For some applications, a queue is the best data structure to use. For others, a binary tree is better. Programmers choose from among many data structures based on how the data will be used by the program.

34 Data Structures in Alice Alice has two built-in data structures that can be used to organize data, or to create other data structures: Lists Arrays

35 Lists A list is an ordered set of data. It is often used to store objects that are to be processed sequentially. A list can be used to create a queue.

36 Arrays An array is an indexed set of variables, such as dancer [1], dancer [2], dancer [3],… It is like a set of boxes that hold things. A list is a set of items. An array is a set of variables that each store an item.

37 Arrays and Lists You can see the difference between arrays and lists when you delete items.

38 Arrays and Lists In a list, the missing spot is filled in when something is deleted.

39 Arrays and Lists In an array, an empty variable is left behind when something is deleted.

40 CS 10340 Abstract Data Types

41 Data abstraction, or abstract data types, is a programming methodology where one defines not only the data structure to be used, but the processes to manipulate the structure –like process abstraction, ADTs can be supported directly by programming languages To support it, there needs to be mechanisms for –defining data structures –encapsulation of data structures and their routines to manipulate the structures into one unit by placing all definitions in one unit, it can be compiled at one time –information hiding to protect the data structure from outside interference or manipulation the data structure should only be accessible from code encapsulated with it so that the structure is hidden and protected from the outside objects are one way to implement ADTs, but because objects have additional properties, we defer discussion of them until the next chapter

42 ADT Design Issues Encapsulation: it must be possible to define a unit that contains a data structure and the subprograms that access (manipulate) it Information hiding: controlling access to the data structure through some form of interface so that it cannot be directly manipulated by external code –this is often done by using two sections of an ADT definition public part (interface) constitutes those elements that can be accessed externally (often the interface permits only access to subprograms and constants) the private part, which remains secure because it is only accessible by subprograms of the ADT itself

43 CS 10343 Abstract Data Types An abstract data type is a mathematical set of data, along with operations defined on that kind of data. It concerns time and space efficiency. Specifying the mathematical and logical properties of data type. Examples: –int: it is the set of integers (up to a certain magnitude), with operations +, -, /, *, % –double: it’s the set of decimal numbers (up to a certain magnitude), with operations +, -, /, * –ADT integer is not universally implemented

44 CS 10344 Abstract Data Types (Contd.) The previous examples belong to what is called built-in data types That is, they are provided by the programming language But new abstract data types can be defined by users, using arrays, enum, structs, classes (if object oriented programming), etc.

45 ADT An abstract data type (ADT) is mathematical model for a certain class of data structures that have similar behavior; or for certain data types of one or more programming languages that have similar semantics. An abstract data type is defined indirectly, only by the operations that may be performed on it and by mathematical constraints on the effects For example, an abstract stack data structure could be defined by three operations: push, that inserts some data item onto the structure, pop, that extracts an item from it (with the constraint that each pop always returns the most recently pushed item that has not been popped yet), and peek, that allows data on top of the structure to be examined without removal. When analyzing the efficiency of algorithms that use stacks, one may also specify that all operations take the same time no matter how many items have been pushed into the stack, and that the stack uses a constant amount of storage for each element.

46 ADT Example ADT contains value definition & operation definition ADT Rational is a mathematical model of Rational number (P/q) & Addition multiplication, testing of equality. Value definition Definition Abstract typedef Rational Condition Rational[1] !=0;

47 Operation definition of ADT Rational Abstract Rational makerational (a, b) Int a, b Precondition b!=0 Postcondition makerational[0]=a; makerational[1]=b Abstract Rational add (a, b) Rational a, b Precondition Add[1]=a[1]*b[1] Add[0]=a[0]*b[1]+a[1]*b[0]

48 48 Data types I We type data--classify it into various categories--such as int, float, char, boolean, String –A data type represents a set of possible values, such as {..., -2, -1, 0, 1, 2,... }, or { true, false } By typing our variables, we allow the computer to find some of our errors –Some operations only make sense when applied to certain kinds of data--multiplication, searching Typing simplifies internal representation –A String requires more and different storage than a boolean

49 49 Data types II A data type is characterized by: –a set of values –a data representation, which is common to all these values, and –a set of operations, which can be applied uniformly to all these values

50 50 Primitive types in C Java provides eight primitive types: –char, byte, short, int, long –float, double Each primitive type has –a set of values –a data representation –a set of operations These are “set in stone”—there is nothing the programmer can do to change anything about them

51 51 Primitive types as data types

52 Algorithm Algorithm based on data structure needs to be designed. Set of rules that define how a particular problem can be solved in finite sequence of steps called an algorithm. Algorithm can be defined as a finite sequence of instructions, each of which has clear meaning and can be executed with finite amount of effort in finite time.

53 Introduction to Algorithm What is Algorithm? –a clearly specified set of simple instructions to be followed to solve a problem Takes a set of values, as input and produces a value, or set of values, as output –May be specified In English As a computer program As a pseudo-code Data structures –Methods of organizing data Program = algorithms + data structures

54 Introduction to Algorithm Why need algorithm analysis ? –writing a working program is not good enough –The program may be inefficient! –If the program is run on a large data set, then the running time becomes an issue

55 Algorithm Desirable features: Each step should be simple It should be unambiguous (Crisp & Clear) It should be effective(Unique solution) Finite number of steps It should be as efficient as possible.

56 Characteristics of Algorithm Input: This part of Algo reads the data for given problem Process : this part of Algo did the required computation Finiteness: Finite number of steps Effectiveness:Each step of Algo should be accurate and precise. It should be executable within with in definite period of time on target machine Output: It must produce desired number

57 Algorithm Design Develop the algorithm Refine the Algorithm Usages of Control statements( Sequence, Selection, Iteration) Analysis of Algorithm Time & Space complexity Asymptotic Notations Big-Oh(O), Omega(Ω) Theta() small-oh(o)

58 How to develop the Algorithm Understand the problem Identify the output of problem Identify the inputs required by the problem and choose the associated data structure Design the logic that will produce the desired output Test the algorithm for different set of input data Repeat the above steps untill desired output for all input types

59 Identifying Inputs& Ouptuts Problem of sorting the given numbers Inputs: List of Numbers Size of List Type of numbers sorted Order of sorting Outputs: Sorted list Message for displaying the sorted list

60 Input & Output specifications... Input Specifications: In what order and format, the input values will be read What is upper and lower bound limits of inputs values, size of list should not be less than zero When to know that there is no more inputs to be read (Identify the end of list) Output Specifications: In what order and format, the output values will be produced What type of values to be produced. What headings and column heading to be printed in output

61 CS 10361 Performance Analysis and Big-O

62 CS 10362 Performance Analysis Determining an estimate of the time and memory requirement of the algorithm. Time estimation is called time complexity analysis Memory size estimation is called space complexity analysis. Because memory is cheap and abundant, we rarely do space complexity analysis Since time is “expensive”, analysis now defaults to time complexity analysis

63 Algorithm Analysis Space complexity –How much space is required Time complexity –How much time does it take to run the algorithm Often, we deal with estimates!

64 Space Complexity Space complexity = The amount of memory required by an algorithm to run to completion –[Core dumps = the most often encountered cause is “memory leaks” – the amount of memory required larger than the memory available on a given system] Some algorithms may be more efficient if data completely loaded into memory –Need to look also at system limitations –E.g. Classify 2GB of text in various categories [politics, tourism, sport, natural disasters, etc.] – can I afford to load the entire collection?

65 Space Complexity (cont’d) 1.Fixed part: The size required to store certain data/variables, that is independent of the size of the problem: - e.g. name of the data collection - same size for classifying 2GB or 1MB of texts 2.Variable part: Space needed by variables, whose size is dependent on the size of the problem: - e.g. actual text - load 2GB of text VS. load 1MB of text

66 Reasons for studying Space complexity If program is to run on multi-user system, it is required to specify the amount of space(RAM) to allocated to program. To know in advance whether sufficient memory is available to run the program There may be several possible solutions with different space requirements. Can be estimate the size of largest program/problem can be solved.

67 Components of Space Instruction space: space needed to store the executable version of program and is fixed. Data space: space needed to store all constants, variables values, it has two components: Fixed and Variable Fixed space: space required to store constants, simple variables. This space is fixed. Variable space: space required to store structured variables such array, struct Dynamically allocated space. This space is variable

68 Components of space complexity Environmental Stack: Space needed to store the information for resuming the suspended (partially completed) function. Each time function is invoked following data is saved on environment stack: Return Address: From where it has to be resumed after completion of function Values of local variables and values of formal parameters of function invoked. Recursive stack space: Amount of space needed by recursive functions is called recursive stack space. It depends upon local & formal parameters, maximum depth of recursion(maximum number of nested recursive calls)

69 Space Complexity (cont’d) S(P) = c + S(instance characteristics) –c = constant Example: void float sum (float* a, int n) { float s = 0; for(int i = 0; i<n; i++) { s+ = a[i]; } return s; } Space? one word for n, one for a [passed by reference!], one for i  constant space!

70 Time complexity The amount of time required to completion Reasons to study time complexity To know in advance whether the program will provide satisfactory real time response(user response) There may different solutions with different time requirement.

71 Factors impacts on execution time Speed of computer Structure of Program Quality of compiler Current load on computer System Input size and Nature of Input Execution time is proportional to input size

72 Algorithm Analysis… Factors affecting the running time –computer –compiler –algorithm used –input to the algorithm The content of the input affects the running time typically, the input size (number of items in the input) is the main consideration –E.g. sorting problem  the number of items to be sorted –E.g. multiply two matrices together  the total number of elements in the two matrices Machine model assumed –Instructions are executed one after another, with no concurrent operations  Not parallel computers

73 Time space Trade-off It is multi-objective optimization problem. Solve the given problem it is required that Less or minimum space is required to solve the problem. It also takes less time to complete its execution. But it is not practical to achieve both objectives. There are several approaches to solve the same problem.

74 Example Algorithm arrayMax(A, n): Input: An array A storing n integers. Output: The maximum element in A. currentMax  A[0] for i  1 to n -1 do if currentMax < A[i] then currentMax  A[i] return currentMax How many operations ?

75 CS 10375 Example of Time Complexity Analysis and Big-O Pseudo-code of finding a maximum of x[n]: double M=x[0]; for i=1 to n-1 do if (x[i] > M) M=x[i]; endif endfor return M; a a b (N-1) * (b+a)

76 CS 10376 Complexity of the algorithm T(n) = a+(n-1)(b+a) Where “a” is the time of one assignment, and “b” is the time of one comparison Both “a” and “b” are constants that depend on the hardware

77 Example: Selection Problem Given a list of N numbers, determine the kth largest, where k  N. Algorithm 1: (1) Read N numbers into an array (2) Sort the array in decreasing order by some simple algorithm (3) Return the element in position k

78 Example: Selection Problem… Algorithm 2: (1) Read the first k elements into an array and sort them in decreasing order (2) Each remaining element is read one by one If smaller than the kth element, then it is ignored Otherwise, it is placed in its correct spot in the array, bumping one element out of the array. (3) The element in the kth position is returned as the answer.

79 Example: Selection Problem… Which algorithm is better when –N =100 and k = 100? –N =100 and k = 1? What happens when N = 1,000,000 and k = 500,000? There exist better algorithms

80 Worst- / average- / best-case Worst-case running time of an algorithm –The longest running time for any input of size n –An upper bound on the running time for any input  guarantee that the algorithm will never take longer –Example: Sort a set of numbers in increasing order; and the data is in decreasing order –The worst case can occur fairly often E.g. in searching a database for a particular piece of information Best-case running time –sort a set of numbers in increasing order; and the data is already in increasing order Average-case running time –May be difficult to define what “average” means

81 Running time Suppose the program includes an if-then statement that may execute or not:  variable running time Typically algorithms are measured by their worst case

82 Running-time of algorithms Bounds are for the algorithms, rather than programs –programs are just implementations of an algorithm, and almost always the details of the program do not affect the bounds Bounds are for algorithms, rather than problems –A problem can be solved with several algorithms, some are more efficient than others

83 Experimental Approach Write a program that implements the algorithm Run the program with data sets of varying size. Determine the actual running time using a system call to measure time (e.g. system (date) ); Problems?

84 Experimental Approach It is necessary to implement and test the algorithm in order to determine its running time. Experiments can be done only on a limited set of inputs, and may not be indicative of the running time for other inputs. The same hardware and software should be used in order to compare two algorithms. – condition very hard to achieve!

85 Use a Theoretical Approach Based on high-level description of the algorithms, rather than language dependent implementations Makes possible an evaluation of the algorithms that is independent of the hardware and software environments  Generality

86 Algorithm Description How to describe algorithms independent of a programming language Pseudo-Code = a description of an algorithm that is –more structured than usual prose but –less formal than a programming language (Or diagrams) Example: find the maximum element of an array. Algorithm arrayMax(A, n): Input: An array A storing n integers. Output: The maximum element in A. currentMax  A[0] for i  1 to n -1 do if currentMax < A[i] then currentMax  A[i] return currentMax

87 Pseudo Code Expressions: use standard mathematical symbols –use  for assignment ( ? in C/C++) –use = for the equality relationship (? in C/C++) Method Declarations: -Algorithm name(param1, param2) Programming Constructs: –decision structures:if... then... [else..] –while-loops while... do –repeat-loops: repeat... until... –for-loop: for... do –array indexing: A[i] Methods –calls: object method(args) –returns:return value Use comments Instructions have to be basic enough and feasible!

88 Low Level Algorithm Analysis Based on primitive operations (low-level computations independent from the programming language) E.g.: –Make an addition = 1 operation –Calling a method or returning from a method = 1 operation –Index in an array = 1 operation –Comparison = 1 operation etc. Method: Inspect the pseudo-code and count the number of primitive operations executed by the algorithm

89 Why Does Growth Rate Matter? Complexity 10 20 30 n 0.00001 sec 0.00002 sec 0.00003 sec n 2 0.0001 sec 0.0004 sec 0.0009 sec n 3 0.001 sec 0.008 sec 0.027 sec n 5 0.1 sec 3.2 sec 24.3 sec 2 n 0.001 sec 1.0 sec 17.9 min 3 n 0.59 sec 58 min 6.5 years

90 Why Does Growth Rate Matter? Complexity 40 50 60 n 0.00004 sec 0.00005 sec 0.00006 sec n 2 0.016 sec 0.025 sec 0.036 sec n 3 0.064 sec 0.125 sec 0.216 sec n 5 1.7 min 5.2 min 13.0 min 2 n 12.7 days 35.7 years 366 cent 3 n 3855 cent 2 x 10 8 cent 1.3 x 10 13 cent

91 Subroutine 1 uses ? basic operation Subroutine 2 uses ? basic operations Subroutine ? is more efficient. This measure is good for all large input sizes In fact, we will not worry about the exact values, but will look at ``broad classes’ of values, or the growth rates Let there be n inputs. If an algorithm needs n basic operations and another needs 2n basic operations, we will consider them to be in the same efficiency category. However, we distinguish between exp(n), n, log(n)

92 Growth Rate The idea is to establish a relative order among functions for large n  c, n 0 > 0 such that f(N)  c g(N) when N  n 0 f(N) grows no faster than g(N) for “large” N

93 Typical Growth Rates

94 Growth rates … Doubling the input size –f(N) = c  f(2N) = f(N) = c –f(N) = log N  f(2N) = f(N) + log 2 –f(N) = N  f(2N) = 2 f(N) –f(N) = N 2  f(2N) = 4 f(N) –f(N) = N 3  f(2N) = 8 f(N) –f(N) = 2 N  f(2N) = f 2 (N) Advantages of algorithm analysis –To eliminate bad algorithms early –pinpoints the bottlenecks, which are worth coding carefully

95 Asymptotically less than or equal to O (Big-Oh) Asymptotically greater than or equal to  (Big-Omega) Asymptotically equal to  (Big-Theta) Asymptotically strictly less o (Little-Oh) Notations

96 CS 10396 Big-O Notation Let n be a non-negative integer representing the size of the input to an algorithm Let f(n) and g(n) be two positive functions, representing the number of basic calculations (operations, instructions) that an algorithm takes (or the number of memory words an algorithm needs).

97 Asymptotic notation: Big-Oh f(N) = O(g(N)) There are positive constants c and n 0 such that f(N)  c g(N) when N  n 0 The growth rate of f(N) is less than or equal to the growth rate of g(N) g(N) is an upper bound on f(N)

98 CS 10398 Big-O Notation (contd.) f(n)=O(g(n)) iff there exist a positive constant C and non-negative integer n 0 such that f(n)  Cg(n) for all n  n0. g(n) is said to be an upper bound of f(n).

99 CS 10399 Big-O Notation (Examples) f(n) = 5n+2 = O(n)// g(n) = n – f(n)  6n, for n  3 (C=6, n 0 =3) f(n)=n/2 –3 = O(n) – f(n)  0.5 n for n  0 (C=0.5, n 0 =0) n 2 -n = O(n 2 ) // g(n) = n 2 – n 2 -n  n 2 for n  0 (C=1, n 0 =0) n(n+1)/2 = O(n 2 ) – n(n+1)/2  n 2 for n  0 (C=1, n 0 =0)

100 Big-Oh: example Let f(N) = 2N 2. Then –f(N) = O(N 4 ) –f(N) = O(N 3 ) –f(N) = O(N 2 ) (best answer, asymptotically tight)

101 Big Oh: more examples N 2 / 2 – 3N = O(N 2 ) 1 + 4N = O(N) 7N 2 + 10N + 3 = O(N 2 ) = O(N 3 ) log 10 N = log 2 N / log 2 10 = O(log 2 N) = O(log N) sin N = O(1); 10 = O(1), 10 10 = O(1) log N + N = O(N) N = O(2 N ), but 2 N is not O(N) 2 10N is not O(2 N )

102 CS 103102 Big-O Notation (In Practice) When computing the complexity, –f(n) is the actual time formula –g(n) is the simplified version of f Since f(n) stands often for time, we use T(n) instead of f(n) In practice, the simplification of T(n) occurs while it is being computed by the designer

103 CS 103103 Simplification Methods If T(n) is the sum of a constant number of terms, drop all the terms except for the most dominant (biggest) term; Drop any multiplicative factor of that term What remains is the simplified g(n). a m n m + a m-1 n m-1 +...+ a 1 n+ a 0 =O(n m ). n 2 -n+log n = O(n 2 )

104 CS 103104 Big-O Notation (Common Complexities) T(n)=O(1)// constant time T(n)=O(log n)// logarithmic T(n)=O(n)// linear T(n)=O(n 2 )//quadratic T(n)=O(n 3 )//cubic T(n)=O(n c ), c  1// polynomial T(n)=O(log c n), c  1// polylogarithmic T(n)=O(nlog n)

105 CS 103105 Common Formulas 1+2+3+…+n= n(n+1)/2 = O(n 2 ). 1 2 +2 2 +3 2 +…+n 2 = n(n+1)(2n+1)/6 = O(n 3 ) 1+x+x 2 +x 3 +…+x n =(x n+1 – 1)/(x-1) = O(x n ).

106 CS 103106 Example of Time Complexity Analysis and Big-O Pseudo-code of finding a maximum of x[n]: double M=x[0]; for i=1 to n-1 do if (x[i] > M) M=x[i]; endif endfor return M;

107 CS 103107 Complexity of the algorithm T(n) = a+(n-1)(b+a) = O(n) Where “a” is the time of one assignment, and “b” is the time of one comparison Both “a” and “b” are constants that depend on the hardware Observe that the big O spares us from –Relatively unimportant arithmetic details –Hardware dependency

108 Why is the big Oh a Big Deal? Suppose I find two algorithms, one of which does twice as many operations in solving the same problem. I could get the same job done as fast with the slower algorithm if I buy a machine which is twice as fast. But if my algorithm is faster by a big Oh factor - No matter how much faster you make the machine running the slow algorithm the fast-algorithm, slow machine combination will eventually beat the slow algorithm, fast machine combination.

109 Properties of the Big-Oh Notation (I) Constant factors may be ignored: For all k > 0, k*f is O(f ). e.g. a*n 2 and b*n 2 are both O(n 2 ) Higher powers of n grow faster than lower powers: n r is O(n s ) if 0 < r < s. The growth rate of a sum of terms is the growth rate of its fastest growing term: If f is O(g), then f + g is O(g). e.g. a*n 3 + b*n 2 is O(n 3 ).

110 Properties of the Big-Oh Notation (II) The growth rate of a polynomial is given by the growth rate of its leading term If f is a polynomial of degree d, then f is O(n d ). If f grows faster than g, which grows faster than h, then f grows faster than h The product of upper bounds of functions gives an upper bound for the product of the functions If f is O(g) and h is O(r), then f*h is O(g*r) e.g. if f is O(n 2 ) and g is O(log n), then f*g is O(n 2 log n).

111 Properties of the Big-Oh Notation (III) Exponential functions grow faster than powers: n k is O(b n ), for all b > 1, k > 0, e.g. n 4 is O(2 n ) and n 4 is O(exp(n)). Logarithms grow more slowly than powers: log b n is O(n k ) for all b > 1, k > 0 e.g. log 2 n is O(n 0:5 ). All logarithms grow at the same rate: log b n is  (log d n) for all b, d > 1.

112 Properties of the Big-Oh Notation (IV) The sum of the first n r th powers grows as the (r + 1) th power: 1 + 2 + 3 + ……. N = N(N+1)/2 (arithmetic series) 1 + 2 2 + 3 2 +………N 2 = N(N + 1)(2N + 1)/6

113 Some rules When considering the growth rate of a function using Big-Oh Ignore the lower order terms and the coefficients of the highest-order term No need to specify the base of logarithm –Changing the base from one constant to another changes the value of the logarithm by only a constant factor If T 1 (N) = O(f(N) and T 2 (N) = O(g(N)), then –T 1 (N) + T 2 (N) = max(O(f(N)), O(g(N))), –T 1 (N) * T 2 (N) = O(f(N) * g(N))

114 Big-Omega  c, n 0 > 0 such that f(N)  c g(N) when N  n 0 f(N) grows no slower than g(N) for “large” N

115 Big-Omega f(N) =  (g(N)) There are positive constants c and n 0 such that f(N)  c g(N) when N  n 0 The growth rate of f(N) is greater than or equal to the growth rate of g(N).

116 Big-Omega: examples Let f(N) = 2N 2. Then –f(N) =  (N) –f(N) =  (N 2 ) (best answer)

117 f(N) =  (g(N)) the growth rate of f(N) is the same as the growth rate of g(N)

118 Big-Theta f(N) =  (g(N)) iff f(N) = O(g(N)) and f(N) =  (g(N)) The growth rate of f(N) equals the growth rate of g(N) Example: Let f(N)=N 2, g(N)=2N 2 –We write f(N) = O(g(N)) and f(N) =  (g(N)), thus f(N) =  (g(N)).

119 Some rules If T(N) is a polynomial of degree k, then T(N) =  (N k ). For logarithmic functions, T(log m N) =  (log N).

120 Little-oh f(N) = o(g(N)) f(N) = O(g(N)) and f(N)   (g(N)) The growth rate of f(N) is less than the growth rate of g(N)

121 Using L' Hopital's rule L' Hopital's rule –If and then = Determine the relative growth rates by using L' Hopital's rule –compute –if 0: f(N) = o(g(N)) –if constant  0: f(N) =  (g(N)) –if  : g(N) = o(f(N)) –limit oscillates: no relation

122 Example Functions sqrt(n), n, 2n, ln n, exp(n), n + sqrt(n), n + n 2 lim n  sqrt(n) /n = 0,sqrt(n) is o(n) lim n  n/sqrt(n) = infinity, n is o(sqrt(n)) lim n  n /2n = 1/2, n is  (2n),  (2n) lim n  2n /n = 2, 2n is  (n),  (n)

123 Example Calculate Lines 1 and 4 count for one unit each Line 3: executed N times, each time four units Line 2: (1 for initialization, N+1 for all the tests, N for all the increments) total 2N + 2 total cost: 6N + 4  O(N) 12341234 1 4N 2N+2 1

124 General Rules For loops –at most the running time of the statements inside the for-loop (including tests) times the number of iterations. Nested for loops –the running time of the statement multiplied by the product of the sizes of all the for-loops. –O(N 2 )

125 General rules (cont’d) Consecutive statements –These just add –O(N) + O(N 2 ) = O(N 2 ) If/Else –never more than the running time of the test plus the larger of the running times of S1 and S2.

126 Another Example Maximum Subsequence Sum Problem Given (possibly negative) integers A 1, A 2,...., A n, find the maximum value of –For convenience, the maximum subsequence sum is 0 if all the integers are negative E.g. for input –2, 11, -4, 13, -5, -2 –Answer: 20 (A 2 through A 4 )

127 Algorithm 1: Simple Exhaustively tries all possibilities (brute force) O(N 3 )

128 Algorithm 2: Divide-and-conquer Divide-and-conquer –split the problem into two roughly equal subproblems, which are then solved recursively –patch together the two solutions of the subproblems to arrive at a solution for the whole problem  The maximum subsequence sum can be  Entirely in the left half of the input  Entirely in the right half of the input  It crosses the middle and is in both halves

129 Algorithm 2 (cont’d) The first two cases can be solved recursively For the last case: –find the largest sum in the first half that includes the last element in the first half –the largest sum in the second half that includes the first element in the second half –add these two sums together

130 4 –3 5 –2 -1 2 6 -2 Max subsequence sum for first half =6 (“4, -3, 5”) second half =8 (“2, 6”) Max subsequence sum for first half ending at the last element is 4 (“4, -3, 5, -2”) Max subsequence sum for sum second half starting at the first element is 7 (“-1, 2, 6”) Max subsequence sum spanning the middle is 11? Max subsequence spans the middle “4, -3, 5, -2, -1, 2, 6” Example: 8 numbers in a sequence, Slides courtesy of Prof. Saswati Sarkar

131 Algorithm 2 … O(1) T(m/2) O(m) O(1) T(m/2)

132 Algorithm 2 (cont’d) Recurrence equation –2 T(N/2): two subproblems, each of size N/2 –N: for “patching” two solutions to find solution to whole problem

133 Algorithm 2 (cont’d) Solving the recurrence: With k=log N (i.e. 2 k = N), we have Thus, the running time is O(N log N)  faster than solution 1 for large data sets


Download ppt "DATA AND FILE STRUCTURE USING C MCA110 M K Pachariya Id. Department of Computer Application, Galgotias."

Similar presentations


Ads by Google