Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics Programming 1 EE, NCKU Tien-Hao Chang (Darby Chang)

Similar presentations

Presentation on theme: "Bioinformatics Programming 1 EE, NCKU Tien-Hao Chang (Darby Chang)"— Presentation transcript:

1 Bioinformatics Programming 1 EE, NCKU Tien-Hao Chang (Darby Chang)

2 Data Abstraction 2

3 Data type –A data type is a collection of objects and a set of operations that act on those objects –For example, the data type int consists of the objects {0, +1, -1, +2, -2, …, INT_MAX, INT_MIN} and the operations +, -, *, /, and % The data types of C –basic data types: char, int, float, and double –group data types: array and struct –pointer data type –user-defined types Abstract data type –An abstract data type (ADT) is a data type that is organized in such a way that the specification of the objects and the operations on the objects is separated from the representation of the objects and the implementation of the operations. –We know what is does, but not necessarily how it will do it. 3

4 4

5 The array as an ADT 5

6 Any Questions? 6

7 Stack 7

8 The Stack ADT A stack is an ordered list in which insertions and deletions are made at one end called the top If we add the elements A, B, C, D, and E to the stack, in that order, then E is the first element we delete from the stack A stack is also known as a Last-In-First-Out (LIFO) list 8

9 9

10 10 Implementation with an array

11 11

12 12 Why we need such a data structure?

13 Stack Evaluation of Expressions The representation and evaluation of expressions is of great interest to computer scientists –(rear+1==front) || (rear==MAX_QUEUE_SIZE-1)(3.1) –x=a/b-c+d*e-a*c(3.2) If we examine these expressions, we notice that they contains: –operators==, +, -, ||, &&, ! –operandsa, b, c, e –parentheses( ) Understanding the meaning of expressions –assume a=4, b=c=2, d=e=3 in the statement (3.2) interpretation 1: ((4/2)-2)+(3*3)-(4*2) = = 1 interpretation 2: (4/(2-2+3))*(3-4)*2 = (4/3)*(-1)*2 = … The challenge is to efficiently generate the machine instructions corresponding to a given expression with precedence and associative rule 13

14 The standard wry of writing expressions is known as infix notation –binary operator in-between its two operands Infix notation is not the one used by compilers to evaluate expressions –Actually, Java virtual machine is a stack machine Instead compilers typically use a parenthesis-free notation referred to as postfix notation Evaluation of Expressions Postfix Expressions 14

15 Evaluation of Expressions Evaluate Postfix Expressions Evaluating postfix expressions is much simpler than the evaluation of infix expressions –no parentheses –no precedence There are no parentheses to consider To evaluate an expression we make a single left-to-right scan of it We can evaluate an expression easily by using a stack 15

16 16 Evaluating 62/3-42*+

17 Evaluation of Expressions Data Representation We now consider the representation of both the stack and the expression 17

18 18 get_token()

19 19

20 Any Questions? 20

21 Can 21 You write a program to evaluate expressions? If not, what’s missing? A further question

22 Evaluation of Expressions Infix to Postfix We can describe am algorithm for producing a postfix expression from an infix one as follows –fully parenthesize expression a / b - c + d * e - a * c ((((a / b) - c) + (d * e)) - (a * c)) –all operators replace their corresponding right parentheses ((((a / b) - c) + (d * e)) - (a * c)) / - *+ *- –delete all parentheses The order of operands is the same in infix and postfix 22

23 23 icp isp

24 Evaluation of Expressions From Infix to Postfix Assumptions –operators(, ), +, -, *, /, % –operandssingle digit integer or variable of one character Operands are taken out immediately Operators are taken out of the stack as long as their in- stack precedence (isp) is higher than or equal to the incoming precedence (icp) of the new operator –if (isp >= icp) pop ‘(’ has low isp, and high icp –op ( ) + -* / % eos Isp Icp

25 25

26 26 Such two-phase strategy (a. infix to postfix and then b. evaluate postfix) is used in practice

27 27 Precedence hierarchy and associative for C

28 Any Questions? 28 About stack

29 Queue 29

30 The Queue ADT A queue is an ordered list in which all insertion take place one end, called the rear and all deletions take place at the opposite end, called the front If we insert the elements A, B, C, D, E, in that order, then A is the first element we delete from the queue A stack is also known as a First-In-First-Out (FIFO) list 30

31 31

32 32 Implementation with an 1D array and two variables

33 33 There might be available space when IsFullQ is true (movement is required) Answer

34 Queue Regard Array as Circular We can obtain a more efficient representation if we regard the array queue[MAX_QUEUE_SIZE] as circular –front: one position counterclockwise from the first element –rear: current end Only one space left when full 34

35 35

36 36 addq() and deleteq() are slightly more complicated

37 37 Queue is much trivial in life

38 A Maze Problem The most obvious choice is a 2D array –0s the open paths and 1s the barriers Notice that not every position has eight neighbors To avoid checking for these border conditions we can surround the maze by a border of ones –an mp maze requires an (m+2)(p+2) array –from [1][1] to [m][p] 38


40 40 Possible moves from maze[row][col]

41 A Maze Problem Implementation of Move typedef struct { short int vert; short int horiz; } offsets; offsets move[8]; // array of moves for each direction If we are at maze[row][col] and we wish to find the position of the next move, maze[next_row][next_col] –next_row = row + move[dir].vert; next_col = col + move[dir].horiz; 41

42 A Maze Problem Maze Traversal Algorithm Maintain a second two-dimensional array, mark, to record the maze positions already checked Use stack to keep path history –typedef struct { short int row; short int col; short int dir; } element; element stack[MAX_STACK_SIZE]; 42

43 43

44 44

45 Any Questions? 45

46 Can 46 We use queue to do the maze problem? If yes, what’s the differences ? A further question

47 A Maze Problem Analysis of path() The worst case of computing time of path is O(mp), where m and p are the number of rows and columns of the maze respectively The choice of add() and delete() decides the search behavior 47

48 List 48

49 List Ordered List Consider the following alphabetized list of three letter English words –bat, cat, sat, vat If we store this list in an array –add the word mat to this list move sat and vat one position to the right before we insert mat –remove the word cat from the list move sat and vat one position to the left Problems of a sequence representation (ordered list) –arbitrary insertion and deletion from arrays can be very time-consuming –waste storage 49

50 List Linked Representation An elegant solution of ordered list Items may be placed anywhere in memory Store the address, or location, of the next element for accessing elements in the correct order Associated with each element is a node which contains both a data component and a pointer to the next item 50

51 List Pointers in C Two most important operators used with the pointer type : –& the address operator –* the dereferencing (or indirection) operator Example –int i, *pi; i is an integer variable and pi is a pointer to an integer –pi = &i; &i returns the address of i and is assigned as the value of pi –to assign a value to i we can use i = 10; *pi = 10; 51

52 List Dynamically Allocated Storage When programming, you may not know how much space you will need, nor do you wish to allocate some vary large area that may never be required C provides heap, for allocating storage at run- time You may call a function, malloc, and request the amount of memory you need When you no longer need an area of memory, you may free it by calling another function, free, and return the area of memory to the system

53 Dynamically Allocated Storage Example

54 List Singly Linked Lists Linked lists are drawn as an order sequence of nodes with links represented as arrows –the name of the pointer to the first node in the list is the name of the list (the list of Figure 4.1 is called ptr) –notice that we do not explicitly put in the values of pointers, but simply draw allows to indicate that they are there 54

55 List Insertion To insert the word mat between cat can sat, we must Get a node that is currently unused; let its address be paddr Set the data field of this node to mat Set paddr’s link field to point to the address found in the link field of the node containing cat Set the link field of the node containing cat to point to paddr 55

56 List Deletion Delete mat from the list We only need to find the element that immediately precedes mat, which is cat, and set its link field to point to mat’s link (Figure 4.3) We have not moved any data, and although the link field of mat still points to sat, mat is no longer in the list 56

57 List Implementation We need the following capabilities to make linked representations possible Defining a node’s structure, that is, the fields it contains –self-referential structures Create new nodes when we need them –malloc() –new in C++ Remove nodes that we no longer need –free() –delete in C++ 57

58 List Invert For a list of length ≧ 1 nodes, the while loop is executed length times and so the computing time is linear or O(length) Two extra pointers are required 58

59 List More about Lists Circularly linked lists –the link field of the last node points to the first node in the list Maintain an available List –the space of freed nodes can be reused later Doubly linked lists 59

60 Any Questions? 60

61 How 61 About using linked list to implement stacks and queues instead of using array? Which one is better? Give me some advantages and disadvantages.

62 List Stacks and Queues When several stacks and queues coexisted, there was no efficient way to represent them sequentially The solution presented above to the n-stack, m-queue problem is both computationally and conceptually simple We no longer need to shift stacks or queues to make space Computation can proceed as long as there is memory available 62

63 Longest Common Subsequence 63 Intwo strings Outlength of the longest common subsequence Requirement - dynamic programming - time/space analyses - using C would be the best Bonus - output a longest common subsequence - output all longest common subsequences

64 Dynamic Programming Like divide-and-conquer, perform iterative calculations The most difference is that divided sub-problems are overlapped (or say, dependent) 64 P(n) P(m 1 ) P(m 2 ) … P(m k ) S 1 S 2 … S k S

65 Dynamic Programming Matrix Multiplication Given a sequence of matrices,, where the size of A i is p i-1 p i, find the best order for minimum scalar multiplications For example – A1  A2  A3  A4 pi: –5 possiblities (A 1 (A 2 (A 3 A 4 )))costs = (A 1 ((A 2 A 3 ) A 4 ))costs = 4055 ((A 1 A 2 )(A 3 A 4 ))costs = ((A 1 (A 2 A 3 ))A 4 )costs = 2856 (((A 1 A 2 ) A 3 )A 4 )costs = n marices result in C(2n,n)/(n+1)=(4 n /n 3/2 ) orders 65

66 Matrix Multiplication Observation of Sub-problems Let T is a order for, T 1 is a order for, and T 2 is a order for –if T is an optimal solution for then, T 1 and T 2 are the optimal solutions for and, respectively Let m[i,j] be the minimum number of scalar multiplications needed to compute the product A i …A j, for 1ijn If the optimal solution splits the product A i …A j =(A i …A k )(A k+1 …A j ), for some k, ik

67 Dynamic Programming Elements Optimal sub-structure (a problem exhibits optimal sub-structure if an optimal solution to the problem contains within it optimal solutions to sub-problems) Overlapping sub-problems Memorization (usually by a table, i.e., a 2D array) Procedure –characterize the structure of an optimal solution –derive a recursive formula for computing the values of optimal solutions the relation between the problem and its sub-problems 67

68 Given two sequences X= and Y=, find a maximum-length common subsequence of X and Y For example –X is 'ABCBDAB' and Y is 'BDCABA' –common subsequences: 'AB', 'ABA', 'BCB', 'BCAB', 'BCBA' … –longest common subsequences: 'BCAB', 'BCBA', … (length = 4) Dynamic Programming Longest Common Subsequence A B C B D A B B D C A B A 68

69 Longest Common Subsequence The Recursive Formula Let L[i,j] be the length of an LCS of the prefixes X i = and Y j =, for 1im and 1jn L[i, j]= L[i-1, j-1]+1 if x i =y j = max(L[i,j-1], L[i-1, j]) if x i y j 69 A BCBDAB B D C A B A A LCS: BCBA

Download ppt "Bioinformatics Programming 1 EE, NCKU Tien-Hao Chang (Darby Chang)"

Similar presentations

Ads by Google