Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genetic Threading By J.Yadgari and A.Amir Published: special issue on Bioinformatics in Journal of Constraints, June 2001 Alexandre Tchourbanov University.

Similar presentations


Presentation on theme: "Genetic Threading By J.Yadgari and A.Amir Published: special issue on Bioinformatics in Journal of Constraints, June 2001 Alexandre Tchourbanov University."— Presentation transcript:

1 Genetic Threading By J.Yadgari and A.Amir Published: special issue on Bioinformatics in Journal of Constraints, June 2001 Alexandre Tchourbanov University of Nebraska at Lincoln CSCE 421-821 December 4, 2001

2 Structure of the presentation Introduction to protein native structure Methods of finding a native structure Physical Computational  Common methods and principles  Protein threading method Protein threading using genetic approach

3 Problem of protein structure prediction Proteins are key molecules in all life processes The function of a protein directly related to its three dimensional structure Knowing and understanding the structure of proteins will have a tremendous impact on understanding of biological processes, medical discoveries, and biotechnological inventions

4 Problem of protein structure preduction Given a sequence of amino acids, predict the unique 3D folding of molecule minimizing its free energy Lys Gly Leu 12 Computational Methods of prediction Physical methods of prediction 3 Practical use of the 3D structural knowledge Primary structure

5 Protein structure A protein is built up from a chain of amino acids linked by peptide bonds There are 20 amino acids that can be divided into several classes based on size and other chemical and physical properties Depending on type of a residue, protein could be either hydrophilic (water loving) or hydrophobic (water hating)

6 General structure of an amino acid Each amino acid consists of: 1.Common main chain part, containing the heavy atoms N, C, O, C  forming amide plane 2.Chain residue of size 0 – 10 additional atoms  Common part Chain residue

7 Peptide bond   Peptide bond connects carboxyl group of the first amino acid with amino group of the second acid Peptide bonds are planar and rigid

8 Sequence of amino acids Sequence of amino acids, connected by peptide bonds, form protein There is no flexibility for rotation around peptide bond There is more flexibility for protein to rotate around N-C  -bond (called the  -angle) and around C-C  -bond (  -angle) These angles are restricted to small regions in natural proteins

9 Part of Protein (…|Phe|Asp|Ala|…)

10 Protein folding Using the freedom of rotations, the protein can fold into a specific and unique three dimensional structure (called conformation), forming a native structure

11 Physical methods of determining protein native structure X-ray crystallography requires significant amounts of purified protein molecules (10 14 ) to grow a crystal and protein needs to crystallize NMR method applicable to proteins of small and average size, which do not crystallize Both methods are expensive and give coherent results on the same protein, proving to be correct Structure of many important proteins is still unknown Physical methods X-ray crystallography NMR (Nuclear Magnetic Resonance)

12 Protein structure in X-ray crystallography X-ray diffraction pattern is recorded and processed using FFT to form electron density map Regions of map with the highest electron density reveal the location of atomic nuclei

13 Family of structures in NMR method Absorption of radio frequency energy is recorded as a 2D spectrum Possible 3D structures are constructed by computer according to NMR signal

14 Computational methods to find a protein structure The unique 3D arrangement of protein corresponds to lowest free energy conformation Most computational approaches for solving the protein folding problem look for the lowest free energy conformation Two principal methods are currently in use for computing the lowest energy conformation: 1.Molecular dynamics 2.Monte Carlo

15 Molecular dynamics Forces acting on each atom at a particular state of the system are calculated using an empirical force field Atoms allowed to move with accelerations resulting from forces, changing conformation Once atom moved significantly, acting forces are recalculated (every 10 -15 sec) Even super computers can simulate only 10 -9 sec of folding time, which is insufficient

16 Monte Carlo method Used with simplified model of protein (does not consider structure of every amino acid) Procedure makes random move from current conformation and evaluates resulting energy changes If new conformation is better, it replaces old one with newly generated, and process repeats Method is not powerful enough to find an optimal conformation even for simple cases

17 Protein threading Many proteins in nature are homologous, having different primary structure, but forming the same conformation to carry out the same functionality in a living matter and having the same evolutionary origin Most protein share the secondary structure motifs: 1.Helices 2.Extended strands forming sheets 3.Specific turns 4.Random coils

18 Protein threading Threading means mapping a given sequence to a given structure To assign a structure to a sequence one would then need to thread the sequence through all known conformations, evaluating compatibility, and assign the most compatible structure to the sequence Upon discovery of completely different structure from any known, enter it into database of structures

19 Protein threading Structure is presented by the black trace Sequence (at the top) is threaded through the structure, encoding an alignment (at the bottom) Zero means structure deletion, values greater that one mean sequence deletion, while one is a fit

20 Protein threading The size of the search space to thread sequence of length k into structure of size n could be found as a selection with repetition Search space is huge and problem appears to be NP-complete [ Unger,R., Moult,J. (1993) ]

21 Protein threading In order to reduce complexity of search task, (m –1) core and m non-core regions are introduced Usually  -helices and  -sheets are core regions, connected by loops Total number of amino acids in core regions is c m loops (non-core) m-1 core regions

22 Protein threading Although suffering from some inherent limitations (such as prediction of the right structure with completely wrong threading), method became a significant tool in protein structure prediction Any threading procedure must contain two major components: 1.An alignment algorithm to position a sequence on a structure 2.Score function to evaluate the “energy” of the sequence in given conformation

23 Protein threading possible implementations Protein threading could be implemented using: 1.Enumeration for small problems, 2.Dynamic programming to find core regions to “freeze”, 3.Monte Carlo variants with Gibbs sampling 4.Branch and bound search Genetic programming with constraints seems to be a decent alternative in comparison with other methods

24 Protein threading using genetic programming Genetic Algorithms are parallel computational tools that are based on the principle of diversity and selection Solutions are represented as strings, for example 11111100111311 Sum of all terms in the string needs to be equal to the number of amino acids in the sequence, as well as length of the string equal to the length of the structure

25 Protein threading using genetic programming These strings are maintained as a population that undergoes evolutionary process via generic operators such as: –Replication (copying of the string to the next generation) –Mutation (changing bits in the string) –Crossover (concatenating a prefix of one string with suffix of another) Energy function is a good candidate to evaluate fit of an offspring

26 Energy function Energy functions are subject to minimizations Energy functions are calculated by extracting from the structural database frequencies of interactions between pairs of residues as a function of amino acids types and distance Tendency of certain hydrophilic residues to be on the surface can be approximated by energy term related to the position

27 Implementing mutation An example of mutation could be transformation of 1111100111311 into 11111100211211, which is also a valid encoding We need to have validity check every time we do mutation and compensate for problems Reverting of substrings is especially interesting mutation, since it does not violate a valid structure of the solution

28 Implementing crossovers 11201120111111 11111100111311 11201120111311 Parent 1 Parent 2 Offsprings 11111100111111

29 Following issues were addressed The linear trade-off between population size and the number of generations Optimal level of mutation rate Locality of mutation operator Locality of the crossover operator Regular mutations versus reverse mutations Magnitude of the mutation operation Quality control of the crossover operation

30 Results For author’s examples, the optimal performance is achieved with population size of 300 solutions and duration of 1000 generations The optimal rate of mutations is 0.25 to 0.3 of the populations

31 The minimal energy of threading runs

32 The average energy of the population during threading

33 Structural comparisons Structural alignment Most similar threading alignment Least similar threading alignment Difference between sequence deletions and structure deletions plots

34 Maximal mutation magnitude Average score of 5 runs after 600 generations Average score of 5 runs after 2000 generations

35 Summary The running time of a GA depends linearly on the number of solutions in the population (i.e. population size) and also depends linearly on the number of generations the process is repeated Genetic algorithms method is a feasible and efficient approach to threading It is especially encouraging that the threading alignments are quite similar, quantitatively, to the structural alignments

36 Summary Changing the locality of the mutation and crossover operation does not show a consistent change in the performance of the algorithm Mutations of high magnitude are counterproductive, probably because changes between the template and the assigned structure do not tend to concentrate in single position Using crossover under strict quality control was shown not to be effective, since genetic mechanism has quality control itself

37 Summary The success of the reverse mutation is quite surprising and should be further explored

38 Future work Threading algorithms should be tested on their ability to assign a conformation for new and unknown sequence Authors plan to implement the genetic algorithm in a complete threading package, with all the necessary components and to test it in a realistic prediction setup.


Download ppt "Genetic Threading By J.Yadgari and A.Amir Published: special issue on Bioinformatics in Journal of Constraints, June 2001 Alexandre Tchourbanov University."

Similar presentations


Ads by Google