Presentation is loading. Please wait.

Presentation is loading. Please wait.

Week 10.

Similar presentations


Presentation on theme: "Week 10."— Presentation transcript:

1 Week 10

2 Homework 9 Use D-segment algorithm to find CNVs.
Input: Number of read starts at each genomic position (1,2,>=3). Use a Poisson model of read counts given copy number.

3 Poisson distribution Probability of observing k counts given a mean of λ counts: Probability of observing 3 or more counts:

4 Score Emission probability of r reads:
Score associated with being in CNV given r observed reads:

5 D-segment algorithm cumul = max = 0; start = 1 for (i = 1..N) {
cumul += score[i] if cumul ≥ max: max = cumul; end = i if (cumul ≤ 0) or (cumul ≤ max) or (i == N) { if max ≥ S: output(start, end, max) max = cumul = 0; start = end = i+1 }

6 How to organize a computational biology project

7

8 Principles Someone unfamiliar with your project should be able to understand what you did and why. Everything you do, you will have to do over again.

9 How not to organize a project
source/ <big, complicated program> tests/

10 Files and directories

11 Carrying out a single experiment
A single driver script should carry out a full experiment. The driver script should take no arguments. Avoid editing intermediate files by hand. Store all file and directory paths in the driver script. Use relative paths. Make the script restartable: if (<output file does not exist>) then <perform operation>

12 Handling errors Check for errors whenever possible.
When an error occurs, abort. Create each output file using a temporary name, then rename the file when it is complete.

13 File and directory names
<id>_<date>_<brief description> Example: 05_ _logistic_regression

14 The information in a filename is contained in both the filename and its path
Bad: predict_gene_expression/predict_gene_expression_using_logistic_regression/predict_gene_expression_using_logistic_regression_test_using_alpha=1 Good: predict_gene_expression/logistic_regression/alpha=1

15 Source directories Include only mature code with a defined specification. Bad: predict_gene_expression(histone mods) Okay: optimize_logisitic_regression_using_gradient_descent(features, labels) Don't be afraid to copy/paste code between experiment directories.

16 Version control Check in every hour or so, so you can roll back bad changes. Check in any and only files that you have edited by hand.


Download ppt "Week 10."

Similar presentations


Ads by Google