Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.

Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant templates of structures:

How can we match a sequence and a structure? MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE Sequence: Similar Sequences take this structure (but remember – sequence is less preserved than structure…) Solvation: which AAs are buried? trp (W): trp (W): probably not here! Pair-Interaction: How well do AAs get along (Positive hate positive? Maybe not…?) MVNGLILNGKTK------------------------AEKVFQYANDNGVDGEWTYTE more: 2nd structures prediction. 2 nd structures constraints (β-strands forming β -sheets…) etc.

“An Efficient and Reliable Protein Fold Recognition Method for Genomic Sequences” David T. Jones (1999) “What a good presentation!” B. Raveh (2003)

For each template (in the Brookhaven PDB): Construct a profile sequence Align with query sequence Calculate structural parameters (“to be continued…”) send parameters to a well-trained NEURON NETWORK (like PSIPred…) OUTPUT: match confidence & alignment GenTHREADER overview: Query sequence MTYKLILNGKTKGETTTEAVDAAT AEKVFQYANDNGVDGEWTYTE Templates

STAGE 1: Building a profile for each template 1.Start with sequence of template peptide: “ MTPAVTTYKLVINGKTLKGETTTKAVDAETAEKAFK QYANDNGVDGVWTYDDATKTFTVTC” 2.Run BLASTP on OWL non-redundant protein sequence data bank, with sequence as input. 3.Take all sequences with E-Value < 0.01. 4.Align using MULTAL – multiple sequence alignment method. 5.Construct a sequence profile based on BLOSUM 50 matrix.

STAGE 2: Align sequence with a profile MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDNGVDGEWTYTE SCORE = ? Length of alignment itself = ?Length of template profile = ? Length of query sequence = ?

STAGE 3: calculate (some) structural parameters In stage 2, the sequence was aligned to a profile of the structure. The aligned sequence is now imposed on the 3D structure of the template, and used for ENERGY POTENTIALS calculation.

STAGE 3: structural parameters (cont.): an energy potential for the probability of the interactions observed in this structure. Distance and sequence separation between certain atoms of two different amino-acids are measured (C β – C β, C β - N, C β – O, etc.) Statistics of known structures were gathered and weighted. The observed interactions are compared to the statistics An energy potential is calculated In essence: the smaller E-Pair, the better. E-Pair (pair interaction potential) aa 39 aa 157

STAGE 3: structural parameters (cont.): Degree of burial (DOB) for an amino acid: “the number of other C β atoms located within 10Å of the residue’s C β atom” In general, hydrophobic amino acids like to be buried, safely away from water. Hydrophilic acids might like the outside world better. Each amino acid DOB is calculated. It’s compared to statistical occurrence. ΔE solv (AA,r) = -RT ln( f(AA,r) / f(r) ) E-Solv (solvation potential) CβCβ 10Å CβCβ CβCβ CβCβ CβCβ CβCβ

STAGE 4: send it all to the (trained) Neuron Network Ouput is a score between 0-1 – translated to confidence level (Low, Medium, High & Certain)

See this page on the web

Who trains the Neural network? Representatives were taken for different fold types in CATH (“T-Level”). CAT numbers were used for comparing pairs. 9169 chain pairs 383 pairs shared a common domain fold (= should give a positive answer) The network was trained with these pairs.

Neural network – black box?

Confidence assignment MEDIUM HIGH LOW CERTAIN

GenTHREADER – what to do with it? Results on a ‘classic’ test set of 68 proteins: High true-positive rate: 73.5% correctly recognized, 48.5% with CERTAIN. Extremely reliable: Every “CERTAIN” prediction was correct. Fast automatic method. For 22 of 68 proteins, alignment is over 50% accurate. Let’s go analyze the Mycoplasma Genitalium with it!

Mycoplasme Genitalium genome analysis – ONE DAY ONLY! Whole Genome Analysis with GenTHREADER

ORF MG276 of mycoplasma gen.: spotting a remote homologue MG276 is an “Adenine Phospho-ribosyl-transferase” (but this information is not given to GenTHREADER) 1HGX is a template of other Phospho-ribosyl-transferase. It has only 10% sequence identity with our MG276! It was found by GenTHREADER as a certain match E-Pair saved the situation! But how do we know it’s true? 1HGX template

Ligand binding site of 1HGX template Substrate

ORF MG276 of mycoplasma gen.: supporting evidence for 1HGX as a template 1.Substrate binding sites preserved 2.Secondary structure prediction of MG276 is similar 3.We cheated all along…

ORF MG353 of mycoplasma gen.: an ORF with no known function MG353 – no homologues found in databases 1HUE is a template of an “Histone-like” protein Very low sequence similarity with our MG353. It was found by GenTHREADER as a certain match Striking similarity in DNA Binding region despite overall low sequence similarity

GenTHREADER improvements: (McGuffin, Jones - may 2003) PSI-BLAST, PSI-PRED (2 nd stuructures), some more… Some Results:

AB-INITIO FOLDING - ROSETTA (Simons et al 1997, 1999, Bystroff & Baker 1998, Bonneau et al 2001) Prediction of a protein fold from scratch? Method I: physically simulate protein folding Problem: CPU time Practical for short peptides Method II: check probability for all possible conformations Problem: infinite search space Solution: use mother nature – decrease search space APKFFRGGNWKMNGKRSLG ELIHTLGDAKLSADTEVVCGI APSITEKVVFQETKAIADNKD WSKVEVHESRIYGGSVTNCK ELASQHDVDGFLVGGASLKP VDGFLHALAEGLGVDINAKH

Decreasing the search space using elements from short peptides: Take fragments of short peptides (3 residues – 9 residues long). Join them together Keep the 2 nd structures constant. “Play” with the angles of loop residues. RESULT: 200,000 decoy structures

In addition - I-Sites prediction 13 local-structure 3D motifs with sequence profiles: Strong independence of motifs (fold-initiation sites?)Strong independence of motifs (fold-initiation sites?) complements secondary structurecomplements secondary structure

Find the correct fold for a given sequence (back to threading…) P(structure) – sequence independant 2nd structure packing Strand hydrogen bonding Strand assembly in sheets Structure compactness Frequency of I-Sites 3D motifs Etc. P(sequence | structure): Solvation 2 nd structure – amino acid (proline in helix, etc.) Pair Interaction I–Sites prediction for this sequence(3D motifs) – did not contribute to performance Etc.

RESULTS in CASP 4 – Baker’s a winner… native structures vs. predicted models

Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.

Similar presentations

Presentation on theme: "Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.

Similar presentations

Presentation on theme: "Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant."— Presentation transcript:

Similar presentations

About project

Feedback