Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Quantitative Modeling of Protein- DNA interaction for Improved Energy Based Motif Finding Algorithm Junguk Hur School of Informatics April 25, 2005 L529.

Similar presentations


Presentation on theme: "A Quantitative Modeling of Protein- DNA interaction for Improved Energy Based Motif Finding Algorithm Junguk Hur School of Informatics April 25, 2005 L529."— Presentation transcript:

1 A Quantitative Modeling of Protein- DNA interaction for Improved Energy Based Motif Finding Algorithm Junguk Hur School of Informatics April 25, 2005 L529 – Term Project

2 BACKGROUND Motif Finding : Important challenge in computation biology. Motif Finding : Important challenge in computation biology. Current Algorithms : Current Algorithms : Many stochastic or combinatorial algorithms to find motifs for a given set of sequences; MEME, Gibbs, CONSENSUS, and etc Many stochastic or combinatorial algorithms to find motifs for a given set of sequences; MEME, Gibbs, CONSENSUS, and etc No quantitative data No quantitative data High-throughput genome-wide quantitative data are available High-throughput genome-wide quantitative data are available ChIP-on-Chip: Chromatin ImmunoPrecipitation on Microarray (In vivo) ChIP-on-Chip: Chromatin ImmunoPrecipitation on Microarray (In vivo) PBM: Protein-Binding Microarray (In vitro) PBM: Protein-Binding Microarray (In vitro) EMBF (Energy Based Motif Finding) Algorithm EMBF (Energy Based Motif Finding) Algorithm Ratio  Binding Affinity  Energy Ratio  Binding Affinity  Energy

3 ChIP-on-Chip ( Ren et al. ) Array of intergenic sequences from the whole genome

4 Energy-Based Motif Finding (EBMF) Chin et al. 2004 Let e i be the average binding energy between TF and sequence s i, then e i = -ln(K e ) Let e i be the average binding energy between TF and sequence s i, then e i = -ln(K e ) Ke = [TFs i ] / [TF][s i ] Color intensity ratio represents the value of Ke Ke = [TFs i ] / [TF][s i ] Color intensity ratio represents the value of Ke Problem Definition Problem Definition Solve A*X = B ( A: Matrix to be decomposed, B: Total Energy, X=New Energy at each Position,To be calculated) Solve A*X = B ( A: Matrix to be decomposed, B: Total Energy, X=New Energy at each Position,To be calculated) Minimize the prediction error Minimize the prediction error Iteratively improve candidate matrix M Iteratively improve candidate matrix M 4 x l energy matrix M to represent the motif (l=motif length) 4 x l energy matrix M to represent the motif (l=motif length)

5 Goals and Methods Ultimately to build better model representing the local and non-local correlation between nucleotides Ultimately to build better model representing the local and non-local correlation between nucleotides Based on the EBMF algorithm Based on the EBMF algorithm Utilizing quantitative measure for DNA-protein interaction Utilizing quantitative measure for DNA-protein interaction Potentially more accurate than the Positional Weight Matrices (PWMs) Potentially more accurate than the Positional Weight Matrices (PWMs) Implementation of EBMF first Implementation of EBMF first Solving linear equations Solving linear equations Matrix Solution : QR-decomposition / LR-decomposition Matrix Solution : QR-decomposition / LR-decomposition Least square method : Downhill Simplex Method Least square method : Downhill Simplex Method Programming Language : Perl Programming Language : Perl Data Set : Yeast ChIP-on-Chip data (GAL4, GCN4, RAP1) Data Set : Yeast ChIP-on-Chip data (GAL4, GCN4, RAP1)

6 Results Implemented EBMF failed to find the motif for each TFs even though initial matrix starting from the TRANSFAC PSSM. Implemented EBMF failed to find the motif for each TFs even though initial matrix starting from the TRANSFAC PSSM. QR/LR-decomposition: Resulted in Infinity QR/LR-decomposition: Resulted in Infinity  Due to singular-like matrix (up to the precision of the machine)  Due to singular-like matrix (up to the precision of the machine) Downhill Simplex Method: Too slow and still deviated from the TRANSFAC result Downhill Simplex Method: Too slow and still deviated from the TRANSFAC result MATLAB : Same as QR MATLAB : Same as QR Tried to modify the matrix Tried to modify the matrix Add small non-zero number to zero element Add small non-zero number to zero element Limit to only one TFBS per promoter Limit to only one TFBS per promoter Worked for short length of random sets but still did not work for the yeast TFs. Worked for short length of random sets but still did not work for the yeast TFs.

7 Discussion Data are singular? Any other tricky way? Data are singular? Any other tricky way? Try other data set. Try other data set. Other direction to use quantitative protein- DNA binding data  Possible correlation among TFs Other direction to use quantitative protein- DNA binding data  Possible correlation among TFsAcknowledgement I deeply thank Dr. Haixu Tang I deeply thank Dr. Haixu Tang


Download ppt "A Quantitative Modeling of Protein- DNA interaction for Improved Energy Based Motif Finding Algorithm Junguk Hur School of Informatics April 25, 2005 L529."

Similar presentations


Ads by Google