Download presentation
Presentation is loading. Please wait.
1
Week 10
2
Homework 9 Use D-segment algorithm to find CNVs.
Input: Number of read starts at each genomic position (1,2,>=3). Use a Poisson model of read counts given copy number.
3
Poisson distribution Probability of observing k counts given a mean of λ counts: Probability of observing 3 or more counts:
4
Score Emission probability of r reads:
Score associated with being in CNV given r observed reads:
5
D-segment algorithm cumul = max = 0; start = 1 for (i = 1..N) {
cumul += score[i] if cumul ≥ max: max = cumul; end = i if (cumul ≤ 0) or (cumul ≤ max) or (i == N) { if max ≥ S: output(start, end, max) max = cumul = 0; start = end = i+1 }
6
How to organize a computational biology project
8
Principles Someone unfamiliar with your project should be able to understand what you did and why. Everything you do, you will have to do over again.
9
How not to organize a project
source/ <big, complicated program> tests/
10
Files and directories
11
Carrying out a single experiment
A single driver script should carry out a full experiment. The driver script should take no arguments. Avoid editing intermediate files by hand. Store all file and directory paths in the driver script. Use relative paths. Make the script restartable: if (<output file does not exist>) then <perform operation>
12
Handling errors Check for errors whenever possible.
When an error occurs, abort. Create each output file using a temporary name, then rename the file when it is complete.
13
File and directory names
<id>_<date>_<brief description> Example: 05_ _logistic_regression
14
The information in a filename is contained in both the filename and its path
Bad: predict_gene_expression/predict_gene_expression_using_logistic_regression/predict_gene_expression_using_logistic_regression_test_using_alpha=1 Good: predict_gene_expression/logistic_regression/alpha=1
15
Source directories Include only mature code with a defined specification. Bad: predict_gene_expression(histone mods) Okay: optimize_logisitic_regression_using_gradient_descent(features, labels) Don't be afraid to copy/paste code between experiment directories.
16
Version control Check in every hour or so, so you can roll back bad changes. Check in any and only files that you have edited by hand.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.