# Bioinformatics Programming 1 EE, NCKU Tien-Hao Chang (Darby Chang)

## Presentation on theme: "Bioinformatics Programming 1 EE, NCKU Tien-Hao Chang (Darby Chang)"— Presentation transcript:

Bioinformatics Programming 1 EE, NCKU Tien-Hao Chang (Darby Chang)

In the last slide More Unix features worthy to mention –job control –I/O redirection and piping –text processing (vi, grep, sed, awk, …) Programming vs. language 2

Programming 3

Before 4 Learning advanced data structures and the associated algorithms

struct 5 A brick to construct advanced data structure in C

struct struct is similar to array from the view that both of them can aggregate a set of objects into a single object (here is not that one in object-oriented) –array: aggregate objects with the same type –struct: aggregate objects with different types struct is the condensation of ‘structure’ Each entry is a struct declaration is usually called a ‘field’ or ‘member’ 6

struct Declaration A struct declaration consists of a list of fields, each of which can have any type –struct mydata {//declare the structure of mydata char name[8]; char id[10]; int math; int eng; }; –defines a type, referred to as struct mydata To create a new variable of this type –// define a variable ‘student’ of the type ‘mydata’ struct mydata student; 7

struct The Memory Space 8 Memory Student name id math eng

struct Test Memory Space #include #include int main(void) { struct data { char name[10]; char sex[2]; int math; }; struct data student; printf("sizeof(student)=%d\n", sizeof(student)); return 0; } Result 16 9

struct Access Fields The dot (.) operator –struct_variable.field_name For example –student.math = 90; –student.eng = 20; –printf("%s’s Math score is %d\n", student.name, student.math); A convenient shortcut to initializing members of struct is shown below –struct data student={"Mary Wang",74}; 10

struct Array of Structures You may define an array of structures –struct student {//declare the structure of student char name[8]; char id[10]; int math; int eng; }; // define an array of 3 variable of the type ‘student’ struct student stu[3]; 11 [0][1]…[7] [0][1]…[9] name id math eng stu[0] stu[1] stu[2].......

struct Pointer to Structure Pointers can be used to refer to a struct by its address –struct mydata { // declare the structure of mydata char name[8]; char id[10]; int math; int eng; } student; // define a mydata variable, student struct mydata * ptr; // define a pointer of mydata ptr = &student; // point ptr to the variable, student Access files from struct pointers –the dereference (->) operator –struct_pointer_variable->field_name –student->math = 90 12

struct Nested Structures Since struct declaration constructs new types, it is trivial to use struct fields just like normal types such as int, double, … –#include #include int main(void) { struct date { // declare date int month; int day; }; struct student { // declare nested structure, student char name[10]; int math; struct date birthday; } s1={"David Li", 80, {2,10}}; // define a student variable, s1 printf("student name:%s\n",s1.name); printf("birthday:%d month, %d day\n", s1.birthday.month, s1.birthday.day); printf("math grade:%d\n",s1.math); return 0; } 13

struct Self-referential Structure Fields are not allowed to be defined as the same type as the declaration they belong But fields can be defined as pointers to the same type as the declaration they belong Such a struct with pointer fields referencing to the same strcut type, is called self-referential structure – struct PERSON { char name[8]; int age; struct PERSON * son; // self-referential pointer }; 14 nameageson

Any Questions? 15

Why 16 Fields are not allowed to be defined as the same type as the declaration they belong? But fields can be defined as pointers to the same type as the declaration they belong? Hint: think from the perspective of memory

The Closeness Between C and the realistic representation is the reason of both a) why C-based program is so fast and b) why C is suitable for teaching 17

Languages Comparison Since the 1950s, computer scientists have devised thousands of programming languages. Many are obscure, perhaps created for a Ph.D. thesis and never heard of since. Compiling to machine code –some languages transform programs directly into Machine Code—the instructions that a CPU understands directly –this transformation process is called compilation –assembly, C, and C++ Interpreted languages –other languages are either interpreted such as Basic, Perl, and Javascript –or a mixture of both being compiled to an intermediate language, including Java and C# 18

Languages Comparison Compile vs. Interpret An interpreted language is processed at runtime. Every line is read, analyzed, and executed. Having to reprocess a line every time in a loop is what makes interpreted languages so slow. –this overhead results in that interpreted code runs between 5–10 times slower than compiled code –their advantage is not needing to be recompiled after changes and that is handy when you're learning to program. Because compiled programs almost always run faster than interpreted, languages such as C and C++ tend to be the most popular for writing games. Java and C# both compile to an interpreted language which is very efficient. Because the Virtual Machine that interprets Java and the.NET framework that runs C# are heavily optimized, it's claimed that applications in those languages are as fast if not faster as compiled C++. 19

Languages Comparison Level of Abstraction How close a particular language is to the hardware? Machine Code is the lowest level followed by assembly. C++ is higher than C because C++ offers greater abstraction. Java and C# are higher than C++ because they compile to an intermediate language called bytecode. When computers first became popular in the 1950s, programs were written in machine code. Programmers had to physically flip switches to enter values. This is such a tedious and slow way of creating an application that higher level computer languages had to be created. 20

Super coder! http://www.evula.org/dragoon/pics/supercoder.jpg 21

Assembler: Fast to run, slow to write –The readable version of Machine Code Mov A,\$45 –Because it is tied to a particular CPU, assembly is not very portable. –Languages like C have reduced the need for assembly except where memory is limited or time critical code is needed. This is typically in the kernel code or in a driver. Basic: For beginners –Basic is an acronym for Beginners All purpose Symbolic Instruction Code and was created to teach programming in the 1960s. –Microsoft have made the language their own with many different versions including VBScript for websites and the very successful Visual Basic. –It is an interpreted language with the only advantage of easy-to-learn. But now it is more like a syntax alternative to C because most programmers are lazy. Pascal: Conscientious programming –Pascal was devised as a teaching language a few years before C but had limited usage. –Until Borland's Turbo Pascal (for Dos) and Delphi (for Windows) appeared, it is suitable for commercial development. –However Borland was up against Microsoft and lost the battle. 22

C: System programming –C was devised in the early 1970s by Dennis Ritchie. It can be thought of as a general purpose tool—very useful and powerful but very easy to let bugs through that can make systems insecure. –C has been described as portable assembly. –The syntax of many scripting languages is based on C. C++: A classy language –C++ (or C plus classes as it was originally known) came about ten years after C and successfully introduced Object Oriented Programming to C, as well as features like exceptions and templates. –Learning all of C++ is a big task—it is by far the most complicated of the programming languages here but once you have mastered it, you'll have no difficulty with any other language. C#: Microsoft's big bet –C# was created by Delphi's architect Anders Hejlsberg after he moved to Microsoft and Delphi developers will feel at home with features such as Windows forms. –C# syntax is very similar to Java, which is not surprising as Hejlsberg also worked on J++ after he moved to Microsoft. –Learn C# and you are well on the way to knowing Java. Both languages are semi-compiled, so that instead of compiling to machine code, they compile to bytecode and are then interpreted. 23

Perl: Websites and utilities –Very popular in the Linux world, Perl was one of the first web languages and remains very popular today. –For doing ‘quick and dirty’ programming on the web it remains unrivalled and drives many websites. –It has though been somewhat eclipsed by PHP as a web scripting language. PHP: Websites coding –PHP was designed as a language for Web Servers and is very popular in conjunction with Linux, Apache, MySql and PHP or LAMP for short. –It is interpreted, but pre-compiled so code executes reasonably quickly. –It can be run on desktop computers but is not as widely used for developing desktop applications. –Based on C syntax, it also includes Objects and Classes. JavaScript : Programs in your browser –Javascript is nothing like Java, instead its a scripting language based on C syntax but with the addition of Objects and is used mainly in browsers. –JavaScript is interpreted and a lot slower than compiled code but works well within a browser. –Invented by Netscape and in doldrums for years. Popular again because of AJAX; Asynchronous Javascript and XML. This allows parts of web pages to update from the server without redrawing the entire page. 24

Position 2010Position 2009Delta in PositionLanguageRatings 2010Delta 2009Status 11=Java17.509%-2.29%A 22=C17.279%+1.42%A 34 ↑ PHP9.908%+0.42%A 43 ↓ C++9.610%-0.75%A 55=(Visual) Basic6.574%-1.71%A 67 ↑ C#4.264%-0.06%A 76 ↓ Python4.230%-0.95%A 89 ↑ Perl3.821%+0.40%A 910 ↑ Delphi2.684%-0.03%A 108 ↓↓ JavaScript2.651%-0.96%A 11 =Ruby2.327%-0.27%A 1232 ↑↑↑↑↑↑↑↑↑↑ Objective-C1.970%+1.79%A 13- ↑↑↑↑↑↑↑↑↑↑ Go0.921%+0.92%A 1415 ↑ SAS0.769%-0.03%A 1513 ↓↓ PL/SQL0.737%-0.31%A 1622 ↑↑↑↑↑↑ MATLAB0.661%+0.20%B 17 =ABAP0.639%+0.00%B 1816 ↓↓ Pascal0.603%-0.13%B 19 =ActionScript0.594%+0.11%B 2027 ↑↑↑↑↑↑↑ Fortran0.563%+0.24%B 25

26 http://www.simplyhired.com/a/jobtrends/graph/q-Perl%2C+Ruby%2C+Python%2C+Php%2C+Javascript%2C+Flex%2C+Groovy/t-line

Languages Comparison Summary Other noteworthy programming languages –Java, Python, Ruby, Go, … The popularity forms for many reasons –history (programmers are lazy), business, and functionality Lasting wars –Java vs..NET (C will, in some form, live forever) –Perl vs. PHP vs. Ruby (web programming) –Perl vs. Python (scripting) –There might be a dominant system language and a scripting language in the future, but it probably converges to a coexistence world. Lower Level Higher Level » more readable » faster to develop » more coding sugar » avoid careless mistakes » easy to debug » faster program » general purpose » powerful to do evil 27

Any Questions? 28

Algorithm 29

Algorithm Specification –a finite set of instructions that accomplishes a particular task –criteria input: zero or more quantities that are externally supplied output: at least one quantity is produced definiteness: clear and unambiguous finiteness: terminate after a finite number of steps effectiveness: instruction is basic enough to be carried out Representation –a natural language, like English or Chinese –a graphic, like flowcharts –a computer language, like C 30

Algorithm Selection Sort From those integers that are currently unsorted, find the smallest and place it next in the sorted list i[0][1][2][3][4] -3010504020 01030504020 11020504030 21020304050 31020304050 31

32

Algorithm Binary Search [0][1][2][3][4][5][6] 8142630435052 leftrightmiddle[middle] :target 06330 43 44443==43 (found) 06330>18 02114 18 21-(not found) Searching a sorted list while (there are more integers to check) { middle = (left + right) / 2; if (target < list[middle]) right = middle - 1; else if (targeeet == list[middle]) return middle; else left = middle + 1; } 33

int binsearch( int list[], int target, int left, int right) { int middle; while (left <= right) { middle = (left + right) / 2; switch (COMPARE(list[middle], target)) { case -1: left = middle + 1; break; case 0: return middle; case 1: right = middle – 1; } } return -1; } » Program 1.6: Searching an ordered list 34

Algorithm Recursive Algorithms Beginning programmers view a function as something that is invoked (called) by another function –it executes its code and then returns control to the calling function This perspective ignores the fact that functions can call themselves (direct recursion) They may call other functions that invoke the calling function again (indirect recursion) –extremely powerful –frequently allow us to express an otherwise complex process in very clear term We should express a recursive algorithm when the problem itself is defined recursively 35

int binsearch( int list[], int target, int left, int right) { int middle; while (left <= right) { middle = (left + right) / 2; switch (COMPARE(list[middle], target)) { case -1: return binsearch(list,target,middle+1,right); case 0: return middle; case 1 : return binsearch(list,target,left,middle-1); } } return -1; } » Program 1.7: Recursive implementation of binary search 36

Any Questions? 37

Data Abstraction 38

Data Abstraction Data type –A data type is a collection of objects and a set of operations that act on those objects –For example, the data type int consists of the objects {0, +1, -1, +2, -2, …, INT_MAX, INT_MIN} and the operations +, -, *, /, and % The data types of C –basic data types: char, int, float, and double –group data types: array and struct –pointer data type –user-defined types Abstract data type –An abstract data type (ADT) is a data type that is organized in such a way that the specification of the objects and the operations on the objects is separated from the representation of the objects and the implementation of the operations. –We know what is does, but not necessarily how it will do it. 39

40

The array as an ADT 41

To 42 Evaluate which algorithm is better

Algorithm Performance Analysis Criteria –Is it correct? –Is it readable? –…–… Performance analysis (machine independent) –space complexity: storage requirement –time complexity: computing time Performance measurement (machine dependent) 43

Performance Analysis Space Complexity S(P)=C+SP(I) Fixed space requirements (C) –independent of the inputs and outputs –instruction, constants, simple variables Variable space requirements (SP(I)) –depend on the instance characteristic I –number, size, values of inputs and outputs associated with I –recursive stack space, including formal parameters, local variables, and return address 44

Any Questions? 45

Analyze Someone’s exercise 46

The recursion stack space needed is 6(n+1), since the depth of recursion is n+1. 47

Performance Analysis Time Complexity T(P)=C+T P (I) The time, T(P), taken by a program, P, is the sum of its compile time C and its run (or execution) time, T P (I) T P (I)=c a ADD(I)+c s SUB(I)+… –Program step: A syntactically or semantically meaningful program segment whose execution time is independent of the instance characteristics. –Introduce a new variable, count, into the program –Tabular method 48

Time Complexity Iterative Summation float sum(float list[], int n) { float tmp = 0; ++count; // for assignment int I; for (i = 0; i < n; ++i) { ++count; // for the for loop tmp += list[i]; ++count; // for assignment } ++count; // last execution of for ++count; // for return return tempsum; } 2n+3 steps 49

Time Complexity Tabular Method Statements/eFrequencyTotal Steps float sum(float list[], int n) 000 { 000 float tmp=0; 111 int i; 000 for (i=0; i { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/11/3017461/slides/slide_50.jpg", "name": "Time Complexity Tabular Method Statements/eFrequencyTotal Steps float sum(float list[], int n) 000 { 000 float tmp=0; 111 int i; 000 for (i=0; i

Any Questions? 51

Asymptotic notation 52

Asymptotic Notation Basic Concepts There are two programs, one with complexity c 1 n 2 +c 2 n and the other with complexity c 3 n –for sufficiently large of value of n, c 3 n will be faster than c 1 n 2 +c 2 n –for small values of n, either could be faster c 1 =1, c 2 =2, c 3 =100  c 1 n 2 +c 2 n  c 3 n for n  98 c 1 =1, c 2 =2, c 3 =1000  c 1 n 2 +c 2 n  c 3 n for n  998 53

Asymptotic Notation O, ,  O [big “oh’’] –f(n)=O(g(n)) iff there exist positive constants c and n 0 such that f(n)  cg(n) for all n, n  n 0 –upper bound, worst case  [big omega] –f(n) =  (g(n)) (read as “f of n is big omega of g of n”) iff there exist positive constants c and n 0 such that f(n)  cg(n) for all n, n  n 0 –lower bound, best case  [big theta] –f(n) =  (g(n)) iff there exist positive constants c 1, c 2, and n 0 such that c 1 g(n)  f(n)  c 2 g(n) for all n, n  n 0 –upper and lower bound Notice that relationship between analyses and notations. For example, sometimes we would analyze the big theta of the worst case of an algorithm. 54

Asymptotic Notation Theorems If f(n) = a m n m +…+a 1 n+a 0, then f(n) = O(n m ) If f(n) = a m n m +…+a 1 n+a 0 and a m > 0, then f(n) = Ω(n m ) If f(n) = a m n m +…+a 1 n+a 0 and a m > 0, then f(n) = Θ(n m ) Examples –f(n) = 3n+2 3n+2  4n, for all n  2, ∴ 3n+2 = O(n) 3n+2  3n, for all n  1, ∴ 3n+2 = Ω(n) 3n  3n+2  4n, for all n  2, ∴ 3n+2 = Θ (n) –f(n) = 10n 2 +4n+2 10n 2 +4n+2  11n 2, for all n  5, ∴ 10n 2 +4n+2 = O(n 2 ) 10n 2 +4n+2  n 2, for all n  1, ∴ 10n 2 +4n+2 = Ω(n 2 ) n 2  10n 2 +4n+2  11n 2, for all n  5, ∴ 10n 2 +4n+2 = Θ(n 2 ) –10n 2 +4n+2 = O(n 2 )// 10n 2 +4n+2  11n 2 for n  5 –6*2 n +n 2 = O(2 n )// 6*2 n +n 2  7*2 n for n  4 55

Practical Complexity To get a feel for how the various functions grow with n, you are advised to study the following three figures 56

57

58

59

Performance Measurement Although performance analysis gives us a powerful tool for assessing an algorithm’s space and time complexity, at some point we also must consider how the algorithm executes on our machine 60

Any Questions? 61

Fibonacci 62 Inn Outthe n-th Fibonacci number Requirement - a recursive version and an iterative version - report - time/space complexity - practical time - code size (less meaningful in C) - using C would be the best Bonus - an algorithm of O(n) time and O(1) space complexity - the best time complexity is O(1) - use Makefile to automate the report

Fibonacci A Reference Kenji Mikawa and Ichiro Semba (2005). "An O (1) time algorithm for generating Fibonacci strings." Electronics and Communications in Japan (Part II: Electronics) 88(9): 67-72. Provided by 陳偉銘 –“However, the majority in this course is male, so…” 63

Deadline 64 2010/3/23 23:59 Zip your code, a step-by-step README of how to execute the code and anything worthy extra credit. Email to darby@ee.ncku.edu.tw. darby@ee.ncku.edu.tw

gcc Multiple Source Files If there are multiple source file –\$ gcc file1.c file2.c -o myprog Or –\$ gcc -c file1.c \$ gcc -c file2.c \$ gcc file1.o file2.o -o myprog The second one compiles source files separately. If only file1.c was modified –\$ gcc -c file1.c \$ gcc file1.o file2.o -o myprog Notice that file2.c does not need to be recompiled. –significant time savings when there are numerous source files This process, though somewhat complicated, is generally handled automatically by a makefile. 66

But how do you know which files should be re-compiled? http://faculty.northseattle.edu/tfurutani/che140/labbook_files/image005.jpg 67

Don’t invent the wheel http://www.morphcoaching.com/mypics/Wheel_invention.jpg 68

Makefile 69

Makefile A Makefile is the configuration file used by a standard program called “make” make is like a project manager in a graphical development environment, but includes many extra features Allows an entire project to be intelligently built with one command on the command line –make avoids re-building targets which are up-to- date, thus, saving typing and compiling time a lot –Makefiles largely similar to the Project and Workspace files you might be used to from Visual C++, JBuilder, Eclipse, etc 70

Makefile Filenames When you key in make, the make looks for the default filenames in the current directory. For GNU make these are –GNUMakefile –makefile –Makefile If there more than one of the above in the current directory, the first one according to the above chosen It is possible to name the Makefile anyway you want, then for make to interpret it –\$ make -f 71

Makefile Dependencies Sometimes one file depends on another file –e.g. a C file depends on its header files If a header file changes, the C files that #include that header file should be recompiled to take into account the changes to the header interface.hinterface.cmain.c main.o final executable file (my_project) interface.o 72

Makefile A Simple Makefile “Rule” hello: hello.c gcc hello.c -o hello Save this text as name “Makefile” in the same directory as the source code To build the project, type “make” Result is an executable named hello If hello file exists, and the file creation time is newer than hello.c, what should “make” do? –nothing 73

Makefile Generic Form of a Rule target 1 target 2..: prerequisite 1 prerequisite 2... command 1 command 2 Target is the output file Prerequisites are the files that are needed by target (and that can cause target to be recompiled if they change). Command (or action) is the actual command to turn the prerequisites into the target. Characters after “#” are regarded as comments Line oriented –If the dependencies or commands are too long and you would like to span them across several lines for clarity and convenience, escape the end of line by “\” at the end. –Make sure NOT to use tabs for such lines. 74

Makefile Target make performs corresponding actions of specific targets Target could be a filename that you want to generate or a phony target, where the later is specially useful for many action automation Suggested phony targets from GNU –allDefault action (build/compile the executable) –installinstall previously built executable –cleanclean temporary files generated during the build process, usually the.o or.obj files The first target listed in the file will be used if no target is formally specified 75

Makefile Multiple Targets MyProject: main.o interface.o gcc main.o interface.o -o MyProject main.o: main.c interface.h gcc -c main.c -o main.o interface.o: interface.c interface.h gcc -c interface.c -o interface.o Build MyProject –\$ make –\$ make MyProject –make will figure out the appropriate order from the prerequisites Compile a non-master targets –\$ make main.o interface.hinterface.cmain.c main.o final executable file (my_project) interface.o 76

Makefile Command A list of actions needed to generate the rule’s target May be empty (just indicate dependencies) Every action is usually a typical shell command you would normally type to do the same thing You can hide commands with a preceding ‘@’ symbol Every command MUST be preceded with a tab! –This is how make identifies actions as opposed to variable assignments and targets. Do not indent actions with spaces! Each action line invoke a sub shell to execute the commands –The sub shell ends after that line –Some changes (such as cd to another directory or set shell variables) won’t pass to the next line –Use ‘;’ symbol to execute multiple commands in one line 77

Makefile Variables In a large Makefile, a good idea is to use variables to make later changes easy For example, rather than typing ‘gcc’ in the command part of every rule, create a variable at the top of the Makefile –CC = gcc Commands can then be –\${CC} source_file.c -o executable_file Case sensitive Use only alphabets, numbers, and ‘_’ Both \$(VAR) or \${VAR} are okay 78

Makefile Other Features Implicit rules –GNU make thus provides some implicit rules for common practices such as the object file of foo.c would be foo.o. For example, the following rules are unnecessary foo.o: foo.c gcc -c -o foo.o foo.c Phony target –The target is always out-of-date and thus the actions are always performed –e.g. ‘.PHONY: clean’ Automatic variables (internal macros) –\$@the filename of the target of the rule –\$ { "@context": "http://schema.org", "@type": "ImageObject", "contentUrl": "http://images.slideplayer.com/11/3017461/slides/slide_79.jpg", "name": "Makefile Other Features Implicit rules –GNU make thus provides some implicit rules for common practices such as the object file of foo.c would be foo.o.", "description": "For example, the following rules are unnecessary foo.o: foo.c gcc -c -o foo.o foo.c Phony target –The target is always out-of-date and thus the actions are always performed –e.g. ‘.PHONY: clean’ Automatic variables (internal macros) –\$@the filename of the target of the rule –\$