François Fages MPRI Bio-info 2009 MPRI C2-19 Protein structure prediction by constraint logic programming François Fages, Constraint Programming Group,

Slides:



Advertisements
Similar presentations
Proteins: Structure reflects function….. Fig. 5-UN1 Amino group Carboxyl group carbon.
Advertisements

Review.
Amino Acids PHC 211.  Characteristics and Structures of amino acids  Classification of Amino Acids  Essential and Nonessential Amino Acids  Levels.
A Ala Alanine Alanine is a small, hydrophobic
François Fages MPRI Bio-info 2007 Formal Biology of the Cell Protein structure prediction with constraint logic programming François Fages, Constraint.
Review of Basic Principles of Chemistry, Amino Acids and Proteins Brian Kuhlman: The material presented here is available on the.
Proteins Function and Structure.
Proteins include a diversity of structures, resulting in a wide range of functions Protein functions include structural support, storage, transport, enzymes,
Proteins. Copyright © 2005 Pearson Education, Inc. publishing as Benjamin Cummings Concept 5.4: Proteins have many structures, resulting in a wide range.
Metabolic fuels and Dietary components Lecture - 2 By Dr. Abdulrahman Al-Ajlan.
Proteins account for more than 50% of the dry mass of most cells
5’ C 3’ OH (free) 1’ C 5’ PO4 (free) DNA is a linear polymer of nucleotide subunits joined together by phosphodiester bonds - covalent bonds between.
The Organic Molecules of Living Organisms
Proteins Function and Structure. Proteins more than 50% of dry mass of most cells functions include – structural support – storage, transport – cellular.
©CMBI 2001 A Ala Alanine Alanine is a small, hydrophobic residue. Its side chain, R, is just a methyl group. Alanine likes to sit in an alpha helix,it.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
Protein Structure FDSC400. Protein Functions Biological?Food?
You Must Know How the sequence and subcomponents of proteins determine their properties. The cellular functions of proteins. (Brief – we will come back.
Proteins and Enzymes Nestor T. Hilvano, M.D., M.P.H. (Images Copyright Discover Biology, 5 th ed., Singh-Cundy and Cain, Textbook, 2012.)
Proteins account for more than 50% of the dry mass of most cells
1.What makes an enzyme specific to one type of reaction (in other words, what determines the function of a protein)? –SHAPE determines the function of.
Protein Synthesis. DNA RNA Proteins (Transcription) (Translation) DNA (genetic information stored in genes) RNA (working copies of genes) Proteins (functional.
Proteins account for more than 50% of the dry mass of most cells
Proteins. PROTEINS Amino acids contain an amino group, a carboxyl group, a carbon and a unique R group.
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
Concept 5.4: Proteins have many structures, resulting in a wide range of functions Proteins account for more than 50% of the dry mass of most cells Protein.
How Proteins Are Made Mrs. Wolfe. DNA: instructions for making proteins Proteins are built by the cell according to your DNA What kinds of proteins are.
PROTEINS.
AMINO ACIDS.
Now playing: Frank Sinatra “My Way” A large part of modern biology is understanding large molecules like Proteins A large part of modern biology is understanding.
Amino Acids are the building units of proteins
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
Welcome Back! February 27, 2012 Sit in any seat for today. You will have assigned seats tomorrow Were you absent before the break? Plan on coming to tutorial.
NOTES: 2.3 part 2 Nucleic Acids & Proteins. So far, we’ve covered… the following MACROMOLECULES: ● CARBOHYDRATES… ● LIPIDS… Let’s review…
The Structure and Function of Macromolecules Chapter Proteins.
Macromolecules of Life Proteins and Nucleic Acids
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
PROTEINS. Proteins Proteins do the nitty-gritty jobs of every living cell. Proteins are made of long strings of individual building blocks known as amino.
Proteins.
Chapter 3 Proteins.
Amino Acids  Amino Acids are the building units of proteins. Proteins are polymers of amino acids linked together by what is called “ Peptide bond” (see.
Amino acids Common structure of 19 AAs H3N+H3N+ COO - R H C Proline.
The Structure and Function of Macromolecules Chpt. 5 The Structure and Function of Macromolecules.
GOVERNMENT ENGINEERING COLLEGE, BHARUCH Subject : Organic Chemistry and Unit Process.
Amino Acids. Amino acids are used in every cell of your body to build the proteins you need to survive. Amino Acids have a two-carbon bond: – One of the.
PROTEINS FOLDED POLYPEPTIDES © 2007 Paul Billiet ODWSODWS.
Could the first steps of chemical evolution have occurred on ancient Earth?  To find out, Stanley Miller combined methane (CH 4 ), ammonia (NH.
Prof. Dr. Margret Mansour
AP Biology Mrs. Ramon. The Molecules of Life Macromolecules LARGE molecules Four classes: 1. Carbohydrates 2. Lipids (Fats) 3. Proteins 4. Nucleic Acids.
Proteins Tertiary Protein Structure of Enzyme Lactasevideo Video 2.
Protein Folding Notes.
Proteins account for more than 50% of the dry mass of most cells
Chpt. 5 The Structure and Function of Macromolecules
Proteins.
Transport proteins Transport protein Cell membrane
Conformationally changed Stability
Proteins account for more than 50% of the dry mass of most cells
Chapter 3 Proteins.
Fig. 5-UN1  carbon Amino group Carboxyl group.
The Structure and Function of Macromolecules
Proteins account for more than 50% of the dry mass of most cells
Proteins Genetic information in DNA codes specifically for the production of proteins Cells have thousands of different proteins, each with a specific.
Conformationally changed Stability
The 20 amino acids.
Translation.
The 20 amino acids.
The Chemical Building Blocks of Life
Example of regression by RBF-ANN
Proteins Proteins have many structures, resulting in a wide range of functions Proteins do most of the work in cells and act as enzymes 2. Proteins are.
“When you understand the amino acids,
Presentation transcript:

François Fages MPRI Bio-info 2009 MPRI C2-19 Protein structure prediction by constraint logic programming François Fages, Constraint Programming Group, INRIA Rocquencourt

François Fages MPRI Bio-info 2009 Molecules in the Cell Small molecules: covalent bonds kcal/mol 70% water, 1% ions, 6% amino acids (20), nucleotides (5), fats, sugars, ATP, ADP, … Macromolecules: hydrogen bonds, ionic, hydrophobic, Waals 1-5 kcal/mol Stability and bindings determined by the number of weak bonds: 3D shape 20% proteins ( amino acids) RNA ( nucleotides AGCU) DNA ( nucleotides AGCT)

François Fages MPRI Bio-info 2009 Aminoacids 20 aminoacids: Alanine (A), Cysteine (C), Aspartic Acid (D), Glutamic Acid (E), Phenylalanine (F), Glycine (G), Histidine (H), Isoleucine (I), Lysine (K), Leucine (L), Methionine (M), Asparagine (N), Proline (P), Glutamine (Q), Arginine (R), Serine (S), Threonine (T), Valine (V), Tryptophan (W), Tyrosine (Y). Same backbone centered on C  linked with covalent C-N bonds to different side chains (residues) from 1 atom (for G) to 18 (for T)

François Fages MPRI Bio-info 2009 Examples: Glycine and Tryptophan C2H5NO2 → 9+1=10 atoms C11H12N2O2 → 9+18=27 atoms White = H, Blue = N, Red = O, Grey = C en.html

François Fages MPRI Bio-info 2009 Primary Structure of Proteins Protein = word of n amino acids (20 n possibilities) linked with C-N bonds n= for hormones, for globular proteins, 3000 for fibrous proteins, … Example: MPRI Methionine-Proline-Arginine-Isoleucine (unstable !)

François Fages MPRI Bio-info 2009 Secondary Structure of Proteins Protein = word of m forms among three forms (3 m possibilities) stabilized by hydrogen bonds H---O:  helices of 5-40 contiguous residues (with 3.6 res. per tour)  sheets of strands composed of 5-10 contiguous residues 3.random coils,…

François Fages MPRI Bio-info 2009 Tertiary Structure of Proteins Protein= 3D spatial conformation Native conformation in a determined environment (e.g. water) structures in Protein Data Bank Bonded interactions between atoms: - covalent bonds (unbreakable) - disulfide bonds (slow to form/break) e.g. between cysteine residues - hydrogen bonds H—O (fast to form/break) Non-bonded interactions between atoms: - electrostatic (long-range) - van der Waals (short range) Confirmation that minimizes the free energy (Anfinsen’s hypothesis)

François Fages MPRI Bio-info 2009 Importance of Protein Structure Prediction Understand protein folding, interaction capabilities, protein docking Domain prediction, function prediction Drug design and/or optimization More than 50% of the drugs target receptor proteins Enzymes design and/or optimization Inverse problem: protein synthesis of a given shape Can restrict the number of amino acids One of the most important problems of bioinformatics Methods are evaluated in the CASP competition (every two years) Protein data bank: repository of tertiary structures (weekly updates)

François Fages MPRI Bio-info 2009 Protein Structure Prediction and Folding Problems Input: the primary structure of a protein i.e. sequence of n amino acids (plus some information on its secondary structure) Output (PSP): the tertiary structure of the protein that minimizes the free energy. Output (PFP): the folding sequence to the tertiary structure of the protein (with secondary structures formed first) E.g. Hypotheses in the simple HP model: Each aminoacid is a sphere centered on its C α atom Each aminoacid is either hydrophobic H or polar P (hydrophilic) Energy(HH)= -1 Energy(HP,PH,PP)=0

François Fages MPRI Bio-info 2009 Protein Structure Prediction Problem 1)Spatial model: in known protein structures, the distance between two consecutive C  atoms is essentially fixed (3.8°A) ◦ Lattice (discrete) models. ◦ Continuous models. 2) Energy model: ◦ HP model: Energy(HH)= -1 Energy(HP,PH,PP)=0 ◦ 20x20 Matrix model: Energy(AB)

François Fages MPRI Bio-info 2009 HP Energy Model Hydrophobic (H) aminoacids: Cys (C), Ile (I), Leu (L), Phe (F), Met (M), Val (V), Trp (W), His (H), Tyr (Y), Ala (A) Polar (P) aminoacids: Lys (K), Glu (E), Arg (R), Ser (S), Gln (Q), Asp (D), Asn (N), Thr (T), Pro (P), Gly (G) The protein is in water: hydrophobic elements tend to occupy the center of the protein. H aminoacids tend to stay close each other (hydrophobic) P aminoacids tend to stay in the frontier (polar) Energy(HH)= -1 Energy(HP,PH,PP)=0 for pairs of aminoacids in contact

François Fages MPRI Bio-info 2009 Matrix Energy Model Same assumption: only pairs of aminoacids in contact contribute to the energy value. There is a potential matrix storing the contribution for each pair of aminoacids in contact. Values are either positive or negative. The global energy must be minimized. Energy(AB)=w AB for pairs on aminoacids in contact

François Fages MPRI Bio-info 2009 Admissible Foldings Let S=s1,…,sn be a sequence of amino acids Def. A folding in a crystal lattice (P,E) is a mapping w:N  P, w(i)=Pi

François Fages MPRI Bio-info 2009 Admissible Foldings Let S=s1,…,sn be a sequence of amino acids Def. A folding in a crystal lattice (P,E) is a mapping w:N  P, w(i)=Pi Def. An admissible folding is a folding such that Constant distance k for consecutive pairs: eucl(w(i),w(i+1))=k Non-overlapping: eucl(w(i),w(j))>0 for i≠j Occupancy of side chain: eucl(w(i),w(j))>k for |i-j|  2 Bend angles in [90°..150°]: k1  eucl(w(i),w(i+2))  k2 for i  {1,…,n-2}

François Fages MPRI Bio-info 2009 Crystal Lattice Models Def. A crystal lattice is a graph (P,E) where P is a set of points in Z 3 connected by undirected edges E.

François Fages MPRI Bio-info 2009 Crystal Lattice Models Def. A crystal lattice is a graph (P,E) where P is a set of points in Z 3 connected by undirected edges E. Def. Squared euclidean distance eucl(A,B)=(x B -x A ) 2 +(y B -y A ) 2 +(z B -z A ) 2

François Fages MPRI Bio-info 2009 Crystal Lattice Models Def. A crystal lattice is a graph (P,E) where P is a set of points in Z 3 connected by undirected edges E. Def. Squared euclidean distance eucl(A,B)=(x B -x A ) 2 +(y B -y A ) 2 +(z B -z A ) 2 Def. A cubic lattice is a crystal lattice (P,E) such that P={(x,y,z) | x,y,z  Z} E={(A,B) | A,B  P, eucl(A,B)=k=1} A cubic lattice is 6-connected.

François Fages MPRI Bio-info 2009 PSP in a Cubic Lattice Model Cubes of size 1, Vertices labeled by aminoacid names Find a conformation w:{1…n}  Z 3 such that: w(i) ≠ w(j) for i ≠ j ||w(i)-w(i+1)||=1 for the Euclidean norm Minimizes the energy E = Σ ||w(i)-w(j)||=1, i+1<j Energy(w(i),w(j)) Thm. The existence of an admissible conformation with energy less than a constant k is NP-complete. Proof. Reduce the bin packing problem to a PSP problem

François Fages MPRI Bio-info 2009 Face-Centered Cubic Lattice Def. A face-centered cubic lattice is a crystal lattice (P,E) such that P={(x,y,z) | x,y,z  Z, x+y+z is even} E={(A,B) | A,B  P, eucl(A,B)=  2} An FCC is 6-connected at distance 2

François Fages MPRI Bio-info 2009 Face-Centered Cubic Lattice Def. A face-centered cubic lattice is a crystal lattice (P,E) such that P={(x,y,z) | x,y,z  Z, x+y+z is even} E={(A,B) | A,B  P, eucl(A,B)=  2} An FCC is 6-connected at distance 2 ?-connected at distance  2 ?

François Fages MPRI Bio-info 2009 Face-Centered Cubic Lattice Def. A face-centered cubic lattice is a crystal lattice (P,E) such that P={(x,y,z) | x,y,z  Z, x+y+z is even} E={(A,B) | A,B  P, eucl(A,B)=  2} An FCC is 6-connected at distance 2 12-connected at distance  2 ? bend angles ?

François Fages MPRI Bio-info 2009 Face-Centered Cubic Lattice Def. A face-centered cubic lattice is a crystal lattice (P,E) such that P={(x,y,z) | x,y,z  Z, x+y+z is even} E={(A,B) | A,B  P, eucl(A,B)=  2} An FCC is 6-connected at distance 2 12-connected at distance  2 (60◦), 90◦, 120◦ and 180◦ bend angles 18 contact neighbours at distance  2

François Fages MPRI Bio-info 2009 Face-Centered Cubic Lattice Model Abstraction: Each aminoacid is a sphere centered on its C α atom Fixed distance of 3.8Ä between C α atoms Table Pot(x,y) of energy value per pairs of aminoacids x and y Face-centered Cubic Lattice model: Cubes of size 2 where points are vertices and central points of faces 12 connected neighbors at distance √2 (corresponding to 3.8Ä) 18 contact neighbors at distance  2 (corresponding to 5.4Ä<6.4Ä) Conformation w:{1…n}  Z 3 such that: w(i) ≠ w(j) for i ≠ j and ||w(i)-w(i+1)||= √2 Minimizes the energy E = Σ ||w(i)-w(j)||=2, i+1<j Pot(w(i),w(j))

François Fages MPRI Bio-info 2009 Principles of Constraint Logic Programming Constrain-and-generate versus generate-and-test. find(X):- find(X):- constrain(X), generate(X), generate(X). test(X). active use of constraints to prune the search tree in advance - symbolic/numerical solving of constraints - constraint propagation over finite domains Optimization by branch-and-bound procedure optim(X,C,R):- cost(X)<C, find(X), optim(Y,cost(X),R). optim(X,C,C).

François Fages MPRI Bio-info 2009 Placing N queens on an NxN Chessboard queens(N,L):- length(L,N), fd_domain(L,1,N), safe(L), fd_labeling(L,first_fail). safe([]). safe([X|L]):- noattack(L,X,1), safe(L). noattack([],_,_). noattack([Y|L],X,I):- X#\=Y, X#\=Y+I, X+I#\=Y, I1 is I+1, noattack(L,X,I1).

François Fages MPRI Bio-info 2009 Optimal Scheduling of a Project of 50 Tasks

François Fages MPRI Bio-info 2009 CLP Approaches to Protein Structure Predication HP cubic model [Backofen-Will 03] HP face-centered cubic model [Dal Palu-Dovier-Fogolari04] from seconds for n<50 to tenth of minutes s for n<100 fcc_pf( ID, Time, Compact):- protein(ID,Primary,Secondary), constrain(Primary,Secondary,Indexes, Tertiary,Energy,Matrix,Freq,Compact), writetime(video,'Constraint time: '),nl,!, solution_search(Time,Primary,Secondary,Indexes, Tertiary,Energy,Matrix,Freq), print_results(ID,Time,Primary,Secondary,Tertiary,Compact).

François Fages MPRI Bio-info 2009 Constraint Logic Program [Dal Palu-Dovier-Fogolari04] Domain bounds domain_bounds(N,[X,Y,Z|Rest]):- CUBESIZE is N * 2, domain([X,Y,Z],0,CUBESIZE), sum([X,Y,Z],#=,Sum), even(Sum), domain_bounds(N, Rest). domain_bounds(_,[]). Circuit constraint avoid_self_loops(Tertiary, N):- BASE is 2*N+1, positions_to_integers(Tertiary,ListIntegers,BASE), all_different(ListIntegers). Connectivty constraints next(X1,Y1,Z1,X2,Y2,Z2):- domain([Dx,Dy,Dz],0,1), Dx #= abs(X1-X2), Dy #= abs(Y1-Y2), Dz #= abs(Z1-Z2), sum([Dx,Dy,Dz], #=, 2). Non-connectivity constraints non_next(X1,Y1,Z1,X2,Y2,Z2):- Dx#=(X1-X2)*(X1-X2), Dy#=(Y1-Y2)*(Y1-Y2), Dz#=(Z1-Z2)*(Z1-Z2), sum([Dx,Dy,Dz],#>,2). Contact energy constraint

François Fages MPRI Bio-info 2009 Explored Search Tree for Protein 1KVG (n=12) Size Time 17 0, , ,8 36 4, , mn 69 2mn 78 1, mn mn