Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Annotation Continued

Similar presentations


Presentation on theme: "Genome Annotation Continued"— Presentation transcript:

1 Genome Annotation Continued
This week’s lab. Genome annotation - web based databases for assigning gene function.

2 Last week’s lab E-value Score Blastx Taxonomy

3 Lab Sequence assembly and analysis Assemble individual sequence reads
Phred = 30 - good or bad?

4 Linking Protein Sequence, Structure, and Function
CDD: Conserved functional domains in proteins represented by a PSSM Domains PSI-BLAST, RPS-BLAST, CDART 3D Domains NCBI Field Guide

5 Position Specific Substitution Rates
Weakly conserved serine Active site serine

6 Position Specific Score Matrix (PSSM)
A R N D C Q E G H I L K M F P S T W Y V 206 D 207 G 208 V 209 I 210 S 211 S 212 C 213 N 214 G 215 D 216 S 217 G 218 G 219 P 220 L 221 N 222 C 223 Q 224 A Serine is scored differently in these two positions Active site nucleophile

7 Hidden Markov Models A statistical model that can be applied to any system that is represented as a discrete state. Applies to protein and nt sequences. Can be thought of much like PSSMs used in PSI-BLAST. After several interations. Are used in gene finding and protein profile analysis.

8 Uses of HMMs in protein function analysis.
TIGRFAMs Strive to annotate function of an entire protein PFAMs Strive to annotate domains of proteins.

9 Homologs, orthologs, and paralogs.
Homologous genes are genes that share a common evolutionary ancestor. Orthologs are genes found in different organisms that arose from a common ancestor. Speciation. Paralogs are genes found in the same organism that arose from a common ancestor. Duplication could have occurred in the species or earlier, often have diverged in function

10 Orthologs may differ in function!

11 TIGRFAM Curated such that proteins in a TIGRFAM should have the same function if they are equivalogs. Proteins have identity over their entire length. Equivalog family = all proteins that are conserved with respect to function since their last common ancestor. Superfamily - all proteins with homology but may have different biological functions. Subfamily - incomplete set of proteins with homology - may have diverse biological functions.

12 PFAM More likely to describe a protein domain rather than a family.
Pfams will not overlap. Crosslisted in TIGRFAM page. ~70% of proteins in SWISS-Prot have a Pfam match.

13 COGs Cluster of orthologous groups
Pairwise comparison of orthologs from many bacterial genomes. Suggests function only (book example).

14 Gene Ontology (GO) “The goal of the Gene Ontology project is to produce a controlled vocabulary that can be applied to all organisms even as knowledge of gene and protein roles in cells is accumulating and changing.” Biological process, Molecular function, Cellular component

15 Literature Curation Saccharomyces genome database (SGD) for example.
Manual curation of the literature for experimental evidence linking function to annotation.

16 Additional databases SMART - Simple Modular Architecture Research Tool. PROSITE - Protein motifs PRODOM - A database based on PSI-BLAST PSSMs. InterPro - A database that brings together many of the above databases so that you can search them all at once. Others.

17 CDD Conserved domain database - linking all of this information together. Consists of SMART, Pfam, and COGs (KOGs). Searchable directly - automatically searched by BLAST. Linked to CDART - allows the identification of proteins with a similar domain architecture.

18 Bottom line about databases
Are useful tools in assigning possible functions. Be careful about annotations example -proteins in the same COG can be orthologs that have evolved different functions. Many annotations are not backed up by experimental data. Some databases are automated - have not been checked for accuracy.

19 Annotation can not be guaranteed without experimental evidence.
Functional genomics


Download ppt "Genome Annotation Continued"

Similar presentations


Ads by Google