Presentation is loading. Please wait.

Presentation is loading. Please wait.

What’s next ?? Today 3.3 Protein function 10.3 Protein secondary structure prediction 17.3 Protein tertiary structure prediction 24.3Gene expression &

Similar presentations


Presentation on theme: "What’s next ?? Today 3.3 Protein function 10.3 Protein secondary structure prediction 17.3 Protein tertiary structure prediction 24.3Gene expression &"— Presentation transcript:

1 What’s next ?? Today 3.3 Protein function 10.3 Protein secondary structure prediction 17.3 Protein tertiary structure prediction 24.3Gene expression & Gene networks 31.3 RNA structure and function 7.4Advances in Bioinformatics

2 Predicting Protein Function

3 protein RNA DNA

4 Biochemical function (molecular function) What does it do? Kinase??? Ligase??? Page 245

5 Function based on ligand binding specificity What (who) does it bind ?? Page 245

6 Function based on biological process What is it good for ?? Amino acid metabolism? Page 245

7 Function based on cellular location DNARNA Page 245 Where is it active?? Nucleolus ?? Cytoplasm??

8 Function based on cellular location DNARNA Page 245 Where is the RNA/Protein Expressed ?? Brain? Testis? Where it is under expressed??

9 GO (gene ontology) http://www.geneontology.org/ The GO project is aimed to develop three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated molecular functions (F) biological processes (P) cellular components (C) Ontology is a description of the concepts and relationships that can exist for an agent or a community of agents

10 GO AnnotationsRIM11 GO evidence and references Molecular Functionglycogen synthase kinase 3 activityglycogen synthase kinase 3 activity (ISS) protein serine/threonine kinase activity (IDA)ISS protein serine/threonine kinase activityIDA Biological Processprotein amino acid phosphorylationprotein amino acid phosphorylation (IGI, ISS) proteolysis (IGI) response to stress (IGI, IMP) sporulation (sensu Fungi) (IMP)IGIISS proteolysisIGI response to stressIGIIMP sporulation (sensu Fungi)IMP Cellular Component cytoplasm (IDA)cytoplasmIDA Extracted from SGD Saccharomyces Genome Database

11 Inferring protein function Bioinformatics approach Based on homology Based on the existence of known protein domains (the protein signature)

12 Inferring protein function based on sequence homology

13 Homologous proteins  Rule of thumb: Proteins are homologous if 25% identical (length >100) DNA sequences are homologous if 70% identical

14 Homologs Proteins with a common evolutionary origin Paralogs - Proteins encoded within a given species that arose from one or more gene duplication events. Orthologs - Proteins from different species that evolved by speciation. Hemoglobin human vs Hemoglobin mouse Hemoglobin human vs Myoglobin human

15 COGs Clusters of Orthologous Groups of proteins > Each COG consists of individual orthologous proteins or orthologous sets of paralogs. > Orthologs typically have the same function, allowing transfer of functional information from one member to an entire COG. DATABASE Refence: Classification of conserved genes according to their homologous relationships. (Koonin et al., NAR)

16 Inferring protein function based on the protein signature

17 The Protein Signature Signature: Existence of a known protein domain or motif Domain: A region of a protein that can adopt a 3D structure Motif (or fingerprint): a short, conserved region of a protein typically 10 to 20 contiguous amino acid residues examples: zinc finger domain immunoglobulin domain

18 DNA Binding domain Zinc-Finger

19 Protein Domains Domains can be considered as building blocks of proteins. Some domains can be found in many proteins with different functions, while others are only found in proteins with a certain function.

20 Varieties of protein domains Page 228 Extending along the length of a protein Occupying a subset of a protein sequence Occurring one or more times

21 Example of a protein with 2 domains: Methyl CpG binding protein 2 (MeCP2) MBDTRD The protein includes a Methylated DNA Binding Domain (MBD) and a Transcriptional Repression Domain (TRD). MeCP2 is a transcriptional repressor.

22 Result of an MeCP2 blastp search: A methyl-binding domain shared by several proteins

23 Are proteins that share only a domain homologous?

24 PROSITE ProSite is a database of protein domains that can be searched by either regular expression patterns or sequence profiles. Zinc_Finger_C2H2 Cx{2,4}Cx3(L,I,V,M,F,Y,W,C)x8Hx{3,5}H

25 Pfam > Database that contains a large collection of multiple sequence alignments of protein domains Based on Profile hidden Markov Models (HMMs).

26 Profile HMM (Hidden Markov Model) D16D17D18 D19 M16M17M18M19 I16I19I18I17 100% D 0.8 S 0.2 P 0.4 R 0.6 T 1.0 R 0.4 S 0.6 XXXX 50% D R T R D R T S S - - S S P T R D R T R D P T S D - - S D - - R 16 17 18 19 HMM is a probabilistic model of the MSA consisting of a number of interconnected states Match delete insert

27 Pfam > Database that contains a large collection of multiple sequence alignments of protein domains Based on Profile hidden Markov Models (HMMs). > The Pfam database is based on two distinct classes of alignments – Seed alignments which are deemed to be accurate and used to produce Pfam A -Alignments derived by automatic clustering of SwissProt, which are less reliable and give rise to Pfam B

28 Physical properties of proteins

29 DNA binding domains have relatively high frequency of basic (positive) amino acids M K D P A A L K R A R N T E A A R R S S R A R K L Q R M GCN4 zif268 M E R P Y A C P V E S C D R R F S R S D E L T R H I R I H T myoD S K V N E A F E T L K R C T S S N P N Q R L P K V E I L R N A I R

30 Transmembrane proteins have a unique hydrophobicity pattern

31 Physical properties of proteins Many websites are available for the analysis of individual proteins for example: EXPASY (ExPASy)ExPASy UCSC Proteome BrowserBrowser ProtoNet HUJIHUJI The accuracy of the analysis programs are variable. Predictions based on primary amino acid sequence (such as molecular weight prediction) are likely to be more trustworthy. For many other properties (such as posttranslational modification of proteins by specific sugars), experimental evidence may be required rather than prediction algorithms. Page 236

32 Knowledge Based Approach IDEA Find the common properties of a protein family (or any group of proteins of interest) which are unique to the group and different from all the other proteins. Generate a model for the group and predict new members of the family which have similar properties.

33 Knowledge Based Approach Generate a dataset of proteins with a common function (DNA binding protein) Generate a control dataset Calculate the different properties which are characteristic of the protein family you are interested for all the proteins in the data (DNA binding proteins and the non-DNA binding proteins Represent each protein in a set by a vector of calculated features and build a statistical model to split the groups Basic Steps 1. Building a Model

34 Support Vector Machine (SVM) To find a hyperplane that maximally separates the DNA-binding from non-DNA binding into two classes Input spaceFeature space Kernel function ? new protein structure DNA binding Non-DNA binding =[x1, x2, x3…] =[y1, y2,y3…]

35 Calculate the properties for a new protein And represent them in a vector Predict whether the tested protein belongs to the family Basic Steps 2. Predicing the function of a new protein

36 Database and Tools for protein families and domains InterPro - Integrated Resources of Proteins Domains and Functional SitesInterPro Prosite – A dadabase of protein families and domain BLOCKS - BLOCKS dbBLOCKS Pfam - Protein families db (HMM derived)Pfam PRINTS - Protein Motif fingerprint dbPRINTS ProDom - Protein domain db (Automatically generated)ProDom PROTOMAP - An automatic hierarchical classification of Swiss-Prot proteinsPROTOMAP SBASE - SBASE domain dbSBASE SMART - Simple Modular Architecture Research ToolSMART TIGRFAMs - TIGR protein families dbTIGRFAMs


Download ppt "What’s next ?? Today 3.3 Protein function 10.3 Protein secondary structure prediction 17.3 Protein tertiary structure prediction 24.3Gene expression &"

Similar presentations


Ads by Google