Presentation is loading. Please wait.

Presentation is loading. Please wait.

I.U. School of Informatics Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana by Irfan.

Similar presentations


Presentation on theme: "I.U. School of Informatics Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana by Irfan."— Presentation transcript:

1 I.U. School of Informatics Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana by Irfan Gunduz 04/25/04 Capstone Presentation

2 INTRODUCTION Motifs Highly conserved regions across a subset of proteins that share the same function  A molecule’s function  A Structural Feature  Family membership Motifs can be used to predict YNEDSKH YDDDSNH YDNDSNH YENDSKH >Seq A >Seq B >Seq C >Seq D I.U. School of Informatics

3 INTRODUCTION Current motif finding soft-wares: MEME PROSITE PRATT, etc Do they work with large number of sequences? Pattern discovery relies on statistical or combinatorial techniques, looking for signals Signal-to-noise ratio becomes less clear as the number of sequences increases What to do? I.U. School of Informatics

4  Develop a computational procedure to find functional motifs from large number of sequences Objective

5 I.U. School of Informatics  BLAST (Sequence alignment tool)  BAG ( Sequence Clustering package)  CLUSTAL W (Multiple sequence alignment)  HMMERII (HMM based software)  BLOCK MAKER (Block/Motif finder)  LAMA (Block comparison tools)  PERL Tools COMPUTATIONAL PROCEDURE

6 I.U. School of Informatics COMPUTATIONAL PROCEDURE 1- Collecting and Clustering Sequences

7 I.U. School of Informatics COMPUTATIONAL PROCEDURE 2 - ENRICHMENT

8 I.U. School of Informatics 3 – REFINEMENT 4 – MOTIF FINDING COMPUTATIONAL PROCEDURE

9 I.U. School of Informatics A Case Study with Disease Resistance Genes in Arabidopsis thaliana

10 I.U. School of Informatics Why Disease Resistance Genes?

11 I.U. School of Informatics Background, Disease Resistance Genes DomainProbable Function TIR CC KIN LRRRecognition of specificity NBATP and GTP binding

12 I.U. School of Informatics 116 disease resistance protein or disease resistance protein like annotated sequences were extracted from Arabidopsis thaliana genome Clustered into 32 groups After refinement four clusters were formed for further analysis # of Sequences Cluster 196 Cluster 245 Cluster 3641 Cluster 411 20 to 640 sequences were added in each cluster after HMM iterations Case Study, Arabidopsis thaliana

13 I.U. School of Informatics Case Study, Arabidopsis thaliana PFAM Search Cluster 1NB-ARC, TIR, Kin, LRR Cluster 2 NB-ARC, Kin, LRR Cluster 3 Ser/Thr Kin Cluster 4 Kin Domains

14 I.U. School of Informatics Cluster1 Cluster2 Results, Block Maker Case Study, Arabidopsis thaliana 15218608 YDVFLSFRGVDTRQTIVSHL 15218618 YDVFLSFRGEDTRKNIVSHL 15220795 YDVFLSFRGEDTRKTIVSHL

15 I.U. School of Informatics Results, Lama and BAG Case Study, Arabidopsis thaliana Cluster1 Cluster2 Cluster1 Cluster2 Cluster3 Clusters at the whole gene level Clusters at the Block Level

16 I.U. School of Informatics TIR-ITIR-IIKin1a Kin2NBS-B Kin1aKin2NBS-BNBS-CNBS-AGLPL Cluster1 Cluster2 Cluster1 Cluster2 Cluster3 Clusters at the whole gene level Clusters at the Block Level LRR Case Study, Arabidopsis thaliana RPP8 RPM1 RPS4 RPP1 RPP5

17 I.U. School of Informatics Number of Disease Resistance Gene Candidates on each Chromosome Cluster 1 1626 16 35 Cluster 2 2006 4 9 CHR-1CHR-IICHR-III CHR-IV CHR-V Case Study, Arabidopsis thaliana

18 I.U. School of Informatics New Disease Resistance Gene Candidates Cluster 1 GI 15236505 GI 15242136 GI 15233862 Cluster 2 GI 15221277 GI 15221280 GI 15217940 GI 15221744 Case Study, Arabidopsis thaliana

19 I.U. School of Informatics To test effectiveness of the computational procedure  792 Unique sequences were merged and submitted to MEME and PRATT to detect functional motifs. Time : Took more than 9000 minutes on Pentium IV 1.7 GHz machine running on Linux Result : No known disease resistance gene motifs were detected Case Study, Arabidopsis thaliana

20 I.U. School of Informatics CONCLUSIONS:  Sensible combination of tools provides an excellent mechanism for motif detection  Clustering helps to improve performance of other well known tools Case Study, Arabidopsis thaliana

21 I.U. School of Informatics ACKNOWLEDGEMENT Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana Irfan Gunduz, Sihui Zhao, Mehmet Dalkilic and Sun Kim will be presented at The 2003 International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences

22 I.U. School of Informatics Case Study, Arabidopsis thaliana

23 I.U. School of Informatics Disease Resistance Mechanism

24 I.U. School of Informatics COMPUTATIONAL PROCEDURE  Refinement B A C D B DC


Download ppt "I.U. School of Informatics Motif Discovery from Large Number of Sequences: A Case Study with Disease Resistance Genes in Arabidopsis thaliana by Irfan."

Similar presentations


Ads by Google