Identification of Helix-Turn-Helix (HTH) DNA-Binding Motifs

Slides:



Advertisements
Similar presentations
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Advertisements

2.5.5 Protein Synthesis H Protein Synthesis Follow-Me – iQuiz.
Protein Structure and Physics. What I will talk about today… -Outline protein synthesis and explain the basic steps involved. -Go over the Chemistry of.
CH. 11 : Transcriptional Control of Gene Expression Jennifer Brown.
T. Hamp & L. Richter Protein Prediction II Exercise.
Negative regulatory proteins bind to operator sequences in the DNA and prevent or weaken RNA polymerase binding.
Hidden Markov models for detecting remote protein homologies Kevin Karplus, Christian Barrett, Richard Hughey Georgia Hadjicharalambous.
Profiles for Sequences
BIOINFORMATICS Ency Lee.
The Sense of Sequense The Sense of Sequense Chris Evelo BiGCaT Bioinformatics Universiteit Maastricht.
Structural bioinformatics
De Novo Sequencing v.s. Database Search Bin Ma School of Computer Science University of Waterloo Ontario, Canada.
The construction of cells DNA or RNA Protein Carbohydrates Lipid etc.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
The construction of cells DNA or RNA Protein Carbohydrates Lipid etc. 04.
Ion Channels. Cell membrane Voltage-gated Ion Channels voltage-gated because they open and close depending on the electrical potential across the membrane.
Profile-profile alignment using hidden Markov models Wing Wong.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
KEY WORDS – CELLS, DNA, INFORMATION All living things are made from Deoxyribonucleic acid is abbreviated This molecule stores that helps cells carry.
1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.
Modularity as an Organizing Principle in Protein Structure.
Protein Fold recognition
Sequence similarity.
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Chromosomes carry genetic information
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Hosted by The Greatest Biology teachers at Rider.
Mutations Section 12–4 This section describes and compares gene mutations and chromosomal mutations.
Protein Synthesis Mrs. Harlin.
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Goals: Discuss 3 examples of transcriptional regulation -Lac operon -Coordinated gene regulation -Regulation of transcription without regulation of polymerase.
Protein Tertiary Structure Prediction
Protein Sequence Alignment and Database Searching.
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Construction of Substitution Matrices
 Translation Creating Protein from mRNA Protein Structure  Proteins are made of Amino Acids.  There are 20 different Amino Acids.  The sequence of.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
What is a Project Purpose –Use a method introduced in the course to describe some biological problem How –Construct a data set describing the problem –Define.
Lecture 6 Web: pollev.com/ucibio Text: To: Type in:
Protein and RNA Families
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Chemical Compound Review
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Typically, classifiers are trained based on local features of each site in the training set of protein sequences. Thus no global sequence information is.
Finding, Aligning and Analyzing Non Coding RNAs Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
Copyright OpenHelix. No use or reproduction without express written consent1.
Computational Biology, Part C Family Pairwise Search and Cobbling Robert F. Murphy Copyright  2000, All rights reserved.
1 Mona Singh What is computational biology?. 2 Mona Singh Genome The entire hereditary information content of an organism.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Peptide de novo sequencing Peptide de novo sequencing is the analytical process that derives a peptide’s amino acid sequence from its tandem mass spectrum.
Nucleic Acids 2 Types What do they do? DNA- deoxyribonucleic acid
There are four levels of structure in proteins
Genome organization and Bioinformatics
Sequence Based Analysis Tutorial
Sequence Based Analysis Tutorial
The future of protein secondary structure prediction accuracy
Molecules of Life: Macromolecules
Table 1. Occurrence of N-X-S/T motives in tryptic peptides1
Correction of translational start site by identification of N-terminal peptide. Correction of translational start site by identification of N-terminal.
Luis Sanchez-Pulido, John F.X. Diffley, Chris P. Ponting 
Reecha Khanal Mentor: Avdesh Mishra Supervisor: Dr. Md Tamjidul Hoque
Immunological Comparisons
TF candidate selection pipeline.
Adam T. McGeoch, Stephen D. Bell  Cell 
Presentation transcript:

Identification of Helix-Turn-Helix (HTH) DNA-Binding Motifs Changhui Yan Department of Computer Science Utah State University

HTH Motifs Protein sequences sharing low similarities can fold into a similar HTH structure. Identifying HTH motifs from sequence is extremely challenging 7 families containing HTH motifs from the Pfam database. Positive data set: 2,198 proteins. Negative data set: 1,518 proteins.

Combination of Amino Acid Sequence and Predicted Secondary Structure LQQITHIANQL-GLE----KDVVRVWF LQQITHIANQL-GLE----KDVVRVWF HHHEEHEEEHMHE----HHEEMMEH HMM_AA HMM_AA_SS

Reduced Alphabets Schemes for reducing amino acid alphabet based on the BLOSUM50 matrix by Henikoff and Henikoff (1992) derived by grouping and averaging the similarity matrix elements as described in the text. (Murphy et al. 2000)

Results Table 1. Cross-Families Evaluations True Positive 1 False Positive 2 HMM_AA 3 HMM_AA_SS (20 letters) 3 227 (Murphy_15) 3 474 (Murphy_10) 3 470 (Murphy_8) 3 431 5 True positive: HTH motifs that are correctly identified as such. False positive: Non-HTH motifs that are identified as HTH motifs. The alphabet used to encode amino acid sequences.

Results Table 2. Comparisons with a method based on profile-profile comparisons Total HTH motifs FFAS03 and HMM_AA_SS FFAS03 only HMM_AA_SS only 563 135 24 71 Table 3. Putative HTH motifs in Ureaplasma parvum Protein Location Annotation from Uniprot sp|Q9PQE5|SCPB_UREPA 176-214 Participates to chromosomal partition during cell division sp|Q9PQV6|RPOB_UREPA 540-587 DNA-directed RNA polymerase sp|Q9PR27|SYY_UREPA 340-380 Tyrosyl-tRNA synthetase sp|Q9PQC2|SYA_UREPA 217-265 Alanyl-tRNA synthetase sp|Q9PQ74|DPO3A_UREPA 365-400 DNA polymerase III subunit alpha sp|Q9PQX7|Y166_UREPA 507-553 Hypothetical protein