Presentation is loading. Please wait.

Presentation is loading. Please wait.

Project No. 7 Structural Genomics of the RGS Protein Family: Development of a Public Web-based Informatics Database Resource Dahai Gai Samuel Kalet Hongbo.

Similar presentations


Presentation on theme: "Project No. 7 Structural Genomics of the RGS Protein Family: Development of a Public Web-based Informatics Database Resource Dahai Gai Samuel Kalet Hongbo."— Presentation transcript:

1 Project No. 7 Structural Genomics of the RGS Protein Family: Development of a Public Web-based Informatics Database Resource Dahai Gai Samuel Kalet Hongbo Yang May 16 2001

2 Signal Transduction Nucleus DNA replication RNA synthesis New mRNA New Protein X

3 RGS in G protein signaling regulation G  -GTP: active form G  -GDP-  : inactive form

4 Mechanism of RGS Regulation RGS: increases turnover rate of intrinsic GTPase activity of G  subunits resulting in increased “off” rate, therefore decreased signal

5 RGS proteins & RGS domain

6 RGS domain 9  helices

7 RGS-Web (Interface) Genomic Data Interaction Data Structural Data Informatics Engine Expression Data Visualization Simulation Engine RGSdb RGS Web

8 24 human genome Predicted genes Potential RGS proteins RGS HMM model Known RGS protein sequences Genescan Prediction of to-be-discovered RGS proteins Subfamily HMM models Subfamily allocation HMM model RGS subfamilies protein sequences Function prediction and biological test ClustalW

9 Human genomic data

10 Gene finding: troubleshooting Trouble: Genescan cannot handle sequences larger than 5M Solving: Split long sequence to multiple short sequences Trouble: If split happen in a gene, genescan will miss it Solving: Make 10kb overlap between split sequences Trouble: Long unknown sequences (Ns) slow down genescan Solving: Replace any long Ns (>0.5kb) with 0.5kb Ns

11 Gene finding: results Tool:Genescan Speed:0.1~0.2s/kb CPU usage:75% 7.5G Machine:Hydra.capsl Mem. Usage: Total running time:3 days

12 Covered by Gai’s talk 24 human genome Predicted genes Potential RGS proteins RGS HMM model Known RGS protein sequences Genescan Prediction of to-be-discovered RGS proteins Subfamily HMM models Subfamily allocation HMM model RGS subfamilies protein sequences Function prediction and biological test ClustalW

13 Multiple sequence alignment n ClustalW http://www-igbmc.u-strasbg.fr/BioInfo/ClustalW/ n Clustal W is a general purpose multiple alignment program for DNA or proteins. n Multiple alignments are carried out in 3 stages u all sequences are compared to each other (pairwise alignments) u a dendrogram (like a phylogenetic tree) is constructed, describing the approximate groupings of the sequences by similarity u the final multiple alignment is carried out, using the dendrogram as a guide.

14 HMM training n HMMer http://hmmer.wustl.edu/ n Using RGS domain sequences have found to train the HMM. n Two set of source data: u Set A: Only those RGS which begin the protein sequence u Set B: all RGS domain sequence in proteins u Elements in set A have high similarity while those in set B have low similarity

15 HMMer usage n Hmmbuild u Input: aligned sequences u output: the hidden Markov model n hmmcalibrate u work on the HMM to improve the E-value sensitivity n hmmsearch u Input: the built HMM, the target protein sequence u output: the domains found, position of the domain, score and E-value

16 HMM search result

17 Q: is there correlation between length and # of RGS Q: density(affinity) of the RGS, metric?

18 Summary of HMMer result

19 HMMer result summary in detail

20 Reasons for the miss n Genome sequence not complete or has error in it. n Genescan prediction is not 100% accurate n... possible reasons, need further investigation.

21 Covered by Gai’s talk 24 human genome Predicted genes Potential RGS proteins RGS HMM model Known RGS protein sequences Genescan Prediction of to-be-discovered RGS proteins Subfamily HMM models Subfamily allocation HMM model RGS subfamilies protein sequences Function prediction and biological test ClustalW

22 Subfamily identification n Build HMM for each subfamily(A - F) n Use each HMM to search the to-be- discovered RGS with high score n Result u chr1-4901, subfamily A u chr4-5038, subfamily A

23 Summary n Integrated tools u gene scan u ClustalW u HMMer n Our framework works in finding genes, performing multiple sequence alignment, building HMM and search to-be-discovered RGS domain in protein sequences.

24 References n De Vries, L., Zheng, B., Fischer, T., Elenko, E., Farquhar, M. G. (2000). The regulator of G protein signaling family. Annu. Rev. Pharmacol. Toxicol. 40:235-71 n De Vries, L., and Farquhar, M. G. (1999). RGS proteins: more than just GAPs for heterotrimeric G proteins. Trends. Cell. Biol. 9(4):138- 44 n Zheng, B., De Vries, L., and Farquhar, M. G. (1999). Divergence of RGS proteins: evidence for the existence of six mammalian RGS subfamilies. Trends. Biochem. Sci. 24(11):411-4 n Berman, D. M., and Gilman, A. G. (1998) Mammalian RGS Proteins: Barbarians at the Gate. J. Biol. Chem. 1998 273: 1269-1272. n Dohlman, H. G., and Thorner, J. (1997) RGS Proteins and Signaling by Heterotrimeric G Proteins. J. Biol. Chem. 1997 272: 3871-3874.


Download ppt "Project No. 7 Structural Genomics of the RGS Protein Family: Development of a Public Web-based Informatics Database Resource Dahai Gai Samuel Kalet Hongbo."

Similar presentations


Ads by Google