Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bioinformatics Tool Development Dong Xu Computer Science Department 109 Engineering Building West 573-882-7064

Similar presentations


Presentation on theme: "Bioinformatics Tool Development Dong Xu Computer Science Department 109 Engineering Building West 573-882-7064"— Presentation transcript:

1 Bioinformatics Tool Development Dong Xu Computer Science Department 109 Engineering Building West E-mail: xudong@missouri.edu 573-882-7064 http://digbio.missouri.edu

2 Components of development Identify a problem Algorithm Application Math. model Software engineering

3 Identify the problem What is exactly the problem? New ideas? Is the problem biologically important? Significance of the work? New problem or improvement? Improve accuracy or speed? Is the computationally problem solvable? Simulate human quantum mechanically?

4 Mathematical Model What is the underlying math problem? Baseline information study Formulation Definition

5 Algorithm (1) Pick up the right method Implementation Testing

6 Algorithm (2) Implementation  Data structure/representation  Language: C, C++, Perl, Java, Matlab?  Unix/Linux or Windows?  Modular programming (objected oriented)  Style: should be user oriented!!!

7 Algorithm (3) Debugging  Tools: gdb, dbx, Visual C++  Logic?  Toy cases  Print intermediates

8 Algorithm (4) Testing and code refinement  Benchmark select good test set, Jack knifes…  internal test application to real cases  beta test send to friendly users for initial tests

9 Software Engineering (1) Suggestions: Easy to read (structured with comments) Avoid “spaghetti” code (goto) Easy to modify Portable to other machines Always think about computational complexity and clock cycles Use dynamic memory allocation

10 Software Engineering (2) Polynomial evaluation y = a+b*x+ c*x**2.0+d*x**3.0+e*x**4.0+f*x**5.0 (42.3 s) y = a+b*x+ c*x**2+d*x**3+e*x**4+f*x**5 (5.63 s) y = a+b*x+ c*x*x+d*x*x*x+e*x*x*x*x+f*x*x*x*x*x (3.15 s) x2 = x*x (2.83 s) x4 = x2*x2 y = a+b*x+ c*x2+d*x*x2+e*x4+f*x*x4 y = a+x*(b+x*(c+x*(d+x*(e+f*x)))) (1.83 s)

11 Software Engineering (3) Precision: Big numbers Tiny numbers Iteration effects Machine dependent score = 1- [(1-P 1 ) (1-P 2 ) (1-P 3 ) (1-P 4 )] = 1- exp [ ( Log(1-P 1 ) + Log(1-P 2 ) + Log(1-P 3 ) + Log(1-P 4 ) ) ]

12 Software Engineering (4) Precision: 1 + ½ + 1/3 + ¼+…+ 1/(M-1) + 1/M = log (M) M -> infinity Forward sumBackward sumlog (M) M= 10^614.357414.392713.8155 M=10^815.403718.807918.4207

13 Software Engineering (5) Loop optimization (1): C program for (i=0; i<1000; i++) (78 msec) for (j=0; j<1000; j++) c[i][j] = c[i][j] + a[i][j] + b[i][j] for (j=0; j<1000; j++)(1860 msec) for (i=0; i<1000; i++) c[i][j] = c[i][j] + a[i][j] + b[i][j]

14 Software Engineering (6) Loop optimization (2): for (i=0; i<100000; i++) (30 msec) x = x*a[i] + b[i] for (i=0; i<100000; i++) y = y*a[i] + b[i] for (i=0; i<100000; i++) (16 msec) { x = x*a[i] + b[i] y = y*a[i] + b[i] }

15 Software Engineering (7) Compiler optimization switch: -O (often improve by 50%, but depending on machines) -O2 (same as –O on some machines): simple inline optimization -O3 (-O4 on some machines): more complex optimizations designed to pipeline code, but may alter semantics)

16 Software Engineering (8) Friendly user interface Graphics, Web, options, automation Pipeline interface with other tools parallel computing multiple machine (server/client) network query

17 Applications Get feedback for adding new features Find good experimental collaborators From tools to papers Continues bug reports

18 Summary Identify a problem: solvable, biologically important Mathematical model: formulation and definition Algorithm: rigorous method, fast implementation, and systematic testing Software Engineering: friendly user interface and integration of different tools Application: work with experimentalists


Download ppt "Bioinformatics Tool Development Dong Xu Computer Science Department 109 Engineering Building West 573-882-7064"

Similar presentations


Ads by Google