Presentation is loading. Please wait.

Presentation is loading. Please wait.

Logistic Regression: To classify gene pairs

Similar presentations


Presentation on theme: "Logistic Regression: To classify gene pairs"— Presentation transcript:

1 Logistic Regression: To classify gene pairs

2 Introduction Linear Regression Classifier Bio-python Libraries Gene pairs classification Into classes: OP (if they belong to the same Operon) NOP (otherwise)

3 genetic structure that contains one or more structural genes
Operon genetic structure that contains one or more structural genes Associated with each Operon are promoter and opertor sequences Classification aim: To identify if genes within a gene pair belong to the same Operon illustration from

4 Logistic Regression Model
A set of input (predictor) variables Distance between the genes Gene expression score Logit Score 𝑆= β 0 + β 1 𝑥 1 + β 2 𝑥 2

5 Training the model Focus on Bacillus Subtilis Operons The training data gathered from Operon DB located at Finding values for the beta coefficients Done through MLE of probabilities (class OP vs class NOP given the data)

6 Training using the entire dataset yields:
Model Accuracy Training using the entire dataset yields: To calculate the Accuracy of the model 10-fold cross validation Leave-one-out cross validation 𝑆= − 𝑥 𝑥 2

7 Model Testing Results (10-fold cross validation)
Average Type I error rate (False positive error rate) 19% Average Specificity (probability that a pair of class NOP are classified correctly) 0.81 Average Type II error rate (False negative error rate) 4% Average Sensitivity (probability that a pair of class OP are classified correctly) 0.96

8 Model Testing Results (Leave-one-out cross validation)
Accuracy = 90% Sensitivity = 94% False -ve rate = 6% Specificity = 82% False +ve Rate = 18%

9 Conclusions & Notes Classifier performed well More than two variables may be needed to improve performance Classifier works on Bacillus Subtilis genes only Due to the difference in gene length for different organisms Operon DataBase (dataset source) Uses 5 variables for classification (improves accuracy) Aims to have all known operon information Therefore has a large training set for multiple organisms Allows web users to perform gene pair classification Located at

10 Thank You Questions ?

11 Refrences 1. Operon. from http://en.wikipedia.org/wiki/Operon
2. Garson G. Logistic regression from 3. Hoon M. The logistic regression model. from 4. Okuda S. Operon DataBase. 5. Schneider J. Cross validation from Illustration:


Download ppt "Logistic Regression: To classify gene pairs"

Similar presentations


Ads by Google