Presentation is loading. Please wait.

Presentation is loading. Please wait.

PREETI MISRA Advisor: Dr. HAIXU TANG SCHOOL OF INFORMATICS - INDIANA UNIVERSITY Computational method to analyze tandem repeats in eukaryote genomes.

Similar presentations


Presentation on theme: "PREETI MISRA Advisor: Dr. HAIXU TANG SCHOOL OF INFORMATICS - INDIANA UNIVERSITY Computational method to analyze tandem repeats in eukaryote genomes."— Presentation transcript:

1 PREETI MISRA Advisor: Dr. HAIXU TANG SCHOOL OF INFORMATICS - INDIANA UNIVERSITY Computational method to analyze tandem repeats in eukaryote genomes

2 Overview Background Background Tandem repeats Tandem repeats Methodology Methodology Results Results Conclusions Conclusions References References Capstone Presentation 05/18/2007 2

3 Background Background April 20, 2006 Capstone Presentation 05/18/2007 3 An array of consecutive repeats An array of consecutive repeats Repeating pattern or consensus = 5 Repeating pattern or consensus = 5 Total repeat length = 25 Total repeat length = 25 3 main types of tandem repeats 3 main types of tandem repeats  Microsatellites -- 1-5 bp repeating pattern  Minisatellites -- 6-50 bp repeating pattern  Large tandem -- greater than 50 bp repeating pattern GATCCGATCCGATCCGATCCGATCC  Background  Tandem repeats  Tandem Gene duplication  Methodology  Tandem Repeat Finder  Dating tandem repeats  Jukes-Cantor model  Results  Analysis  Conclusion

4 Significance Use tandem repeats to determine whether 2 DNA samples belong to same person or not Use tandem repeats to determine whether 2 DNA samples belong to same person or not Uses – Uses –  Forensic use  Paternity testing Capstone Presentation 05/18/2007 4 Image downloaded from www.egensburg.de/Fakultaeten/Medizin/Klinische_Chemie/lehre/vorlesung/Manuskript_Profiling%20.pdf  Background  Tandem repeats  Tandem Gene duplication  Methodology  Tandem Repeat Finder  Dating tandem repeats  Jukes-Cantor model  Results  Analysis  Conclusion

5 Mechanism of tandem duplication Unequal recombination is the major known mechanism for the formation of large tandem repeats Unequal recombination is the major known mechanism for the formation of large tandem repeats Image has been downloaded from http://hc.ims.u okyo.ac.jp/JSBi/journal/GIW02/GIW02F010/GIW02F010.html Image has been downloaded from http://hc.ims.u okyo.ac.jp/JSBi/journal/GIW02/GIW02F010/GIW02F010.htmlhttp://hc.ims.u Capstone Presentation 05/18/2007 5  Background  Tandem repeats  Tandem Gene duplication  Methodology  Tandem Repeat Finder  Dating tandem repeats  Jukes-Cantor model  Results  Analysis  Conclusion

6 Tandem gene duplication Benefits – New functions arise. Responsible for the evolution of gene clusters Benefits – New functions arise. Responsible for the evolution of gene clusters Example – Zinc finger genes in mammalian genes Example – Zinc finger genes in mammalian genes Capstone Presentation 05/18/2007 6 http://www.steve.gb.com/images/molecules/nucleotides/zinc_finger_(side).jpg 2 homologous Genes second gene = Duplicated

7 Purpose Large tandem repeats are commonly found in eukaryotes – humans have 1.684 % and chimpanzees have 1.525% Large tandem repeats are commonly found in eukaryotes – humans have 1.684 % and chimpanzees have 1.525% To date the large tandem duplication and find the relationship between various characteristics of long tandem repeats and corresponding evolutionary time To date the large tandem duplication and find the relationship between various characteristics of long tandem repeats and corresponding evolutionary time 8 genomes – 3 primates, 2 rodents, dog, chicken and puffer fish were analyzed 8 genomes – 3 primates, 2 rodents, dog, chicken and puffer fish were analyzed Capstone Presentation 05/18/2007 7  Background  Tandem repeats  Tandem Gene duplication  Methodology  Tandem Repeat Finder  Dating tandem repeats  Jukes-Cantor model  Results  Analysis  Conclusion

8 Methodology Identification Identification  Tandem repeat finder (TRF) for identification of large tandem repeats Distance computation Distance computation  Jukes – Cantor distance model to find distance between two repeats Transformation Transformation  Transform the above computed distance into evolutionary time Capstone Presentation 05/18/2007 8  Background  Tandem repeats  Tandem Gene duplication  Methodology  Tandem Repeat Finder  Dating tandem repeats  Jukes-Cantor model  Results  Analysis  Conclusion

9 Tandem Repeat Finder Tandem Repeat Finder STRING, Mreps and TRF STRING, Mreps and TRF TRAP: T.Jose, P. Sobreira, A.Durham and A.Gruber TRF can be downloaded at http://tandem.bu.edu/trf/trf.html TRF can be downloaded at http://tandem.bu.edu/trf/trf.htmlhttp://tandem.bu.edu/trf/trf.html Starting and ending positions of tandem repeat was present Starting and ending positions of tandem repeat was present Number of repetitions Number of repetitions A%, C%, G%, T% percentage of bases in the tandem repeat A%, C%, G%, T% percentage of bases in the tandem repeat Length of the consensus word (only the first 10 bases) Length of the consensus word (only the first 10 bases) Capstone Presentation 05/18/2007 9  Background  Tandem repeats  Tandem Gene duplication  Methodology  Tandem Repeat Finder  Dating tandem repeats  Jukes-Cantor model  Results  Analysis  Conclusion

10 Tandem Repeat Finder Tandem repeat finder outline : Tandem repeat finder outline : Tandem repeat finder program has 2 main components – detection and analysis Tandem repeat finder program has 2 main components – detection and analysis  Detection - Finds candidate tandem repeats  Analysis - Produces an alignment for each candidate and statistics about the alignment  Background  Tandem repeats  Tandem Gene duplication  Methodology  Tandem Repeat Finder  Dating tandem repeats  Jukes-Cantor model  Results  Analysis  Conclusion Capstone Presentation 05/18/2007 10

11 Tandem Repeat Finder Tandem Repeat Finder Large tandem repeats were extracted Large tandem repeats were extracted Results of TRF – Results of TRF – 1 5 100 0 50 20 40 20 20 1.92 GATCC GATCCGATCCGATCCGATCCGATCC 1 5 100 0 50 20 40 20 20 1.92 GATCC GATCCGATCCGATCCGATCCGATCC GATCC - period or consensus GATCCGATCCGATCCGATCCGATCC - repeat 1 - indices 5 - consensus or period size 100 - percent matches 0 - percent indels 50 - score 20 - % of A 40 - % of C 1.92 - entropy  Background  Tandem repeats  Tandem Gene duplication  Methodology  Tandem Repeat Finder  Dating tandem repeats  Jukes-Cantor model  Results  Analysis  Conclusion Capstone Presentation 05/18/2007 11

12 DNA Sequence Evolution Model For Dating DNA Sequence Evolution Model For Dating AAGACTT TGGACTTAAGGCCT 3 mil yrs 2 mil yrs 1 mil yrs today AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT TAGCCCATAGACTTAGCGCTTAGCACAAAGGGCAT TAGCCCTAGCACTT AAGACTT TGGACTTAAGGCCT AGGGCATTAGCCCTAGCACTT AAGGCCTTGGACTT AGCGCTTAGCACAATAGACTTTAGCCCAAGGGCAT Capstone Presentation 05/18/2007 12 D

13 Computing divergence of tandem repeating units – Computing divergence of tandem repeating units – Repeat identity each repeat is compared with other repeats and maximum similarity/identity is considered Repeat identity - each repeat is compared with other repeats and maximum similarity/identity is considered GATCC GATCC|GATCC|GATCC|GATCC|GATCC Dating tandem duplications Dating tandem duplications Capstone Presentation 05/18/2007 13  Background  Tandem repeats  Tandem Gene duplication  Methodology  Tandem Repeat Finder  Dating tandem repeats  Jukes-Cantor model  Results  Analysis  Conclusion

14 Jukes-Cantor model Computes the distance between 2 repeats Computes the distance between 2 repeats All bases occur with equal probability, All bases occur with equal probability, i.e. p = 0.25 for A, T, G and C i.e. p = 0.25 for A, T, G and C All possible base substitutions are equally likely as follows - All possible base substitutions are equally likely as follows - A ↔ G, A ↔ C, A ↔ T, G ↔ T A ↔ G, A ↔ C, A ↔ T, G ↔ T Capstone Presentation 05/18/2007 14  Background  Tandem repeats  Tandem Gene duplication  Methodology  Tandem Repeat Finder  Dating tandem repeats  Jukes-Cantor model  Results  Analysis  Conclusion

15 Jukes-Cantor model m = no. of mutations n = length of sequence D = -3/4 ln(1- 4/3 m/n) D = Distance between two repeats Ex- Observed mismatches at 25% of the sites, then Jukes Cantor model predicts the distance between two repeat is 0.304 Capstone Presentation 05/18/2007 15  Background  Tandem repeats  Tandem Gene duplication  Methodology  Tandem Repeat Finder  Dating tandem repeats  Jukes-Cantor model  Results  Analysis  Conclusion

16 Estimating the evolutionary time Transforming the computed distance (D) between two repeats into evolutionary time Transforming the computed distance (D) between two repeats into evolutionary time Neutral mutation rate in mammals is nearly 1.25 * 10 -9 per year per site Neutral mutation rate in mammals is nearly 1.25 * 10 -9 per year per site Time (T) = D / 1.25 * 10 -9 years ago Time (T) = D / 1.25 * 10 -9 years ago Ex- D = 0.1 Ex- D = 0.1 T = 0.1 / 1.25 * 10 -9 = 80 million years ago T = 0.1 / 1.25 * 10 -9 = 80 million years ago Capstone Presentation 05/18/2007 16  Background  Tandem repeats  Tandem Gene duplication  Methodology  Tandem Repeat Finder  Dating tandem repeats  Jukes-Cantor model  Results  Analysis  Conclusion

17 Material and Method Material Material  The genome files were downloaded from UCSC site http://hgdownload.cse.ucsc.edu/downloads.html http://hgdownload.cse.ucsc.edu/downloads.html  The tandem repeat finder and stretcher software were downloaded Procedure Procedure  Extraction of large tandem repeats with the help of tandem repeat finder  Calculation of similarities between tandem repeats using stretcher  Computation of the distance using Jukes- Cantor model  Transformation of distance to the evolutionary time Capstone Presentation 05/18/2007 17  Background  Tandem repeats  Tandem Gene duplication  Methodology  Tandem Repeat Finder  Dating tandem repeats  Jukes-Cantor model  Results  Analysis  Conclusion

18 Tree of life Tree of life 500 Million years ago 225 92 75 25 6 12-24

19 Recap – period & repeat Capstone Presentation 05/18/2007 19 ATTCGATTCGATTCGGGATTCGACATTCG ATTCG REPEAT PERIOD or CONSENSUS

20 Results ResultsGenome Chr#, Longest repeat length Chr#, highest total# of repeat Chr#, longest period length Total repeat Total genome size Chr#, highest % of repeat Total coverage (% of repeat in genome) HUMAN CHIMPANZEE 8, 62142 1, 5943 8, 1985 48 MB 2.97 GB 19, 4.9463 1.525 MACAQUE 13, 119145 19, 5399 19, 1969 44 MB 2.87 GB 1.538 RAT 18, 63266 1, 5156 1, 1989 20 MB 2.75 GB 12, 2.034 0.424 MOUSE 7, 203136 5, 4214 10, 1983 19 MB 2.61 GB X, 1.242 0.263 DOG X, 37449 1, 1613 18, 1852 6 MB 2.40 GB 0.748 CHICKEN 1, 16680 1, 958 13, 1988 4 MB 1.1 GB 16, 7.779 0.735 PUFFER FISH 1, 6586 Y, 217961 X, 2000 51 MB 1.684 2, 13045 2, 139 2, 1569 0.53 MB 0.245 19, 8.408 1, 0.4902 385 MB 10, 0.51 3.1 GB 19, 6.06

21 Results Capstone Presentation 05/18/2007 21  Background  Tandem repeats  Tandem Gene duplication  Methodology  Tandem Repeat Finder  Dating tandem repeats  Jukes-Cantor model  Results  Analysis  Conclusion

22 Total number of repeats Capstone Presentation 05/18/2007 22

23 Total number of period or consensus Capstone Presentation 05/18/2007 23

24 Results of repeat length Capstone Presentation 05/18/2007 24

25 % Repeat results Fish Human Human

26 Dating tandem repeats Capstone Presentation 05/18/2007 26

27 Tree of life Tree of life 500 Million years ago 225 92 75 25 6 12-24 Capstone Presentation 05/18/2007 27

28 Conclusions Primates (human, chimpanzee and macaque) have highest number of long tandem repeat duplications Primates (human, chimpanzee and macaque) have highest number of long tandem repeat duplications Dating peak is prominent in human, chimpanzee and macaque, especially between 80-120 million years ago Dating peak is prominent in human, chimpanzee and macaque, especially between 80-120 million years ago Tandem repeat results follow a pattern which is similar to the divergence as shown in the tree of life Tandem repeat results follow a pattern which is similar to the divergence as shown in the tree of life Dog, rat and mouse show steady increase in number of tandem duplications but burst is negligible between 80-120 million years ago Dog, rat and mouse show steady increase in number of tandem duplications but burst is negligible between 80-120 million years ago Human has highest number of duplications among all studied genomes Human has highest number of duplications among all studied genomes Capstone Presentation 05/18/2007 28  Background  Tandem repeats  Tandem Gene duplication  Methodology  Tandem Repeat Finder  Dating tandem repeats  Jukes-Cantor model  Results  Analysis  Conclusions

29 Acknowledgements Advisor – Dr. Haixu Tang Advisor – Dr. Haixu Tang School of Informatics School of Informatics Members of Computational Omics Lab Members of Computational Omics Lab Parents, Rajen & Rajeev Parents, Rajen & Rajeev Prasanta Prasanta

30 References Methods for reconstructing the history of tandem repeats and their application to the human genome Methods for reconstructing the history of tandem repeats and their application to the human genome Authors: Jaitly D, Kearney P, Lin G, Ma B Authors: Jaitly D, Kearney P, Lin G, Ma B A Survey on Algorithmic Aspects of Tandem Repeats Evolution. A Survey on Algorithmic Aspects of Tandem Repeats Evolution. Authors: E. Rivals Authors: E. Rivals Topological Rearrangements and Local Search Method for Tandem Duplication Trees Topological Rearrangements and Local Search Method for Tandem Duplication Trees Authors: Denis Bertrand and Olivier Gascuel Greedy method for inferring tandem duplication history Greedy method for inferring tandem duplication history Authors: Louxin Zhang Bin Ma Lusheng Wang and Ying Xu A fast and accurate distance algorithm to reconstruct tandem duplication trees A fast and accurate distance algorithm to reconstruct tandem duplication trees Authors: Elemento O. and Gascuel O Tandem repeats finder: a program to analyze DNA sequences Tandem repeats finder: a program to analyze DNA sequences Author: Gary Benson Author: Gary Benson

31


Download ppt "PREETI MISRA Advisor: Dr. HAIXU TANG SCHOOL OF INFORMATICS - INDIANA UNIVERSITY Computational method to analyze tandem repeats in eukaryote genomes."

Similar presentations


Ads by Google