Presentation is loading. Please wait.

Presentation is loading. Please wait.

Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites.

Similar presentations

Presentation on theme: "Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites."— Presentation transcript:

1 Repeats in the Genome Lecture 11/2

2 Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites


4 Large repeats: Transposons Transposable elements (TEs) –Sequences that get moved/copied into different loci in the genome P elements in Drosophila: genes piggybacked on transposons and inserted into the genome, in the lab –transgenic fruitflies

5 Transposons

6 Transposons

7 Transposons

8 Retrotransposons: 2 examples SINEs : Short Interspersed repeats – bp; up to 1M copies; –Non-autonomous –Example : Alu repeats –13 % of human genome LINEs : Long Interspersed repeats –Up to 7 Kbp long; ,000 copies –Autonomous –Examples: LINE1, LINE2, LINE3 –21 % of human genome

9 Functions of interspersed repeats May cause disruptions, disease –Colorectal cancer Role in evolution of new genes Function of SINEs and LINEs not fully known –Selfish DNA ? Parasitic elements akin to viruses

10 RepeatMasker Program to detect and mask interspersed repeats in a sequence Also finds low complexity sequences and masks them Can work with a library of known repeats

11 Tandem Repeats Satellites –In centromeres and telomeres –Repeating pattern 1bp s bp long Mini- and micro-satellites –simple, small sequence repeats

12 Microsatellite 541 gagccactag tgcttcattc tctcgctcct actagaatga acccaagatt gcccaggccc 601 aggtgtgtgt gtgtgtgtgt gtgtgtgtgt gtgtgtgtgt gtatagcaga gatggtttcc 661 taaagtaggc agtcagtcaa cagtaagaac ttggtgccgg aggtttgggg tcctggccct 721 gccactggtt ggagagctga tccgcaagct gcaagacctc tctatgcttt ggttctctaa 781 ccgatcaaat aagcataagg tcttccaacc actagcattt ctgtcataaa atgagcactg 841 tcctatttcc aagctgtggg gtcttgagga gatcatttca ctggccggac cccatttcac a microsatellite in a dog ( canis familiaris ) gene 1-5bp repeating pattern

13 Microsatellites Copy numbers variable across individuals Associated with human diseases –Fragile X syndrome, Huntingtons disease, Myotonic dystrophy Can be used for genetic fingerprinting & paternity tests, due to high variability

14 Minisatellites 1 tgattggtct ctctgccacc gggagatttc cttatttgga ggtgatggag gatttcagga 61 tttgggggat tttaggatta taggattacg ggattttagg gttctaggat tttaggatta 121 tggtatttta ggatttactt gattttggga ttttaggatt gagggatttt agggtttcag 181 gatttcggga tttcaggatt ttaagttttc ttgattttat gattttaaga ttttaggatt 241 tacttgattt tgggatttta ggattacggg attttagggt ttcaggattt cgggatttca 301 ggattttaag ttttcttgat tttatgattt taagatttta ggatttactt gattttggga 361 ttttaggatt acgggatttt agggtgctca ctatttatag aactttcatg gtttaacata 421 ctgaatataa atgctctgct gctctcgctg atgtcattgt tctcataata cgttcctttg Consensus AGGATTTT 6-20 bp repeating pattern

15 Minisatellites Highly polymorphic across individuals –Used for DNA fingerprinting Regulation of gene expression

16 Recognizing repeat sequences Dot plots Self-similarity

17 Tandem repeat detection Have to account for approximate tandem repeats –Repeating unit may not be exactly same (mutations) –May not be exactly in tandem (indels)

18 TRF (Benson) Assume > 80% sequence identity on average Assume < 10% rate of indels Basic idea T A T A C G T C G A G A C T T A T C C A C G G A G A T A T T T A


20 Statistical criteria The candidate tandem repeat converted into a Bernoulli (head/tail) sequence Assess significance of this sequence, assuming a probabilistic model CCACAACC-CGTCAGGCAAGT CTGCACCATCGTCTGGGAAGT HTTHHTHTTHHHHTHHTHHHH

21 Statistical criteria Sequence of length 100, with p H = 0.75 >=95% of time, total number of heads is >=68 >=95% of time, total number of heads in runs of length 5 or more is >=26 We are counting only head-runs of length k or more This tells us what would would be a significant number of heads

22 Statistical criteria Due to indels, a repeating pattern of size d may induce exact-matching k- tuples separated by d,d 1, d 2 etc. Consider all such pairs, up to d d max d max calculated using an assumption about p I (the indel frequency) and a random-walk model

23 Statistical criteria Other criteria to –distinguish tandem repeats from non- tandem direct repeats matching k-tuples biased on one side –pick tuple sizes

24 Mreps (another program) Different algorithm to detect repeats Maximal run of k-mismatch tandem repeats, with period p: –A maximal string such that any substring of length 2p is a tandem repeat with at most k mismatches –All such maximal runs can be computed in time O(nk log(k)), where n is length of sequence

25 Mreps: Statistical criteria Two reasons for insignificance –Short length Reject runs of length < p+9 –Too many mismatches Create random DNA sequences, and infer quality filter based on this

26 Gene Duplications If a region containing a gene is duplicated, a new copy of gene is created: paralogs Eases up the selective pressure on one of the copies –free exploration of sequence space Cases of entire genomes being duplicated –yeast, wheat

27 Pseudogenes Upon gene duplication, one of the two copies may gather a deleterious mutation –Example: premature stop codon Once the gene dies in this fashion, no more selective pressure on it. Such a dead copy of a gene is a pseudogene

28 Pseudogenes Any sequence that appears to code for a gene product, but does not do so Origins of pseudogenes –Gene duplication –Change of environment, gene no longer needed –portion of mRNA transcript reverse-transcribed and inserted into genome Create problems for genome study –Mis-annotated as genes

29 Pseudogenes Pseudogenes mutate at neutral rate, free of any selective pressures Can be used for evolutionary analysis Example: –In Drosophila, insertions:deletions in the ratio of 1:8, based on study of pseudogenes

30 Tandem Repeats and Binding Sites Regulatory modules have 20-40% coverage by tandem repeats –Based on a study on Drosophila –Very significant statistically, if assuming low-order Markov background Relation between tandem repeats and binding sites ?

31 Tandem Repeats and Binding Sites Possibility: Tandem repeats help in creating duplicates of binding sites Multiple copies of binding site –helps exploring new binding sites –helps fine-tune binding affinity Faster evolution ?

32 Implications for regulatory sequence analysis Regulatory sequence modeled as a mixture of motif and non-motif background Background typically a Markov chain of fixed order –Given last k bases, S[i..i+k-1], next base determined by a fixed probability distribution

33 Tandem Repeats in Model Tandem repeats violate Markov assumption: previous k bases S[i..i+k-1] may provide a probability distribution on next base, OR we may have a tandem repeat of previous j <= k bases Similarly, a binding site or a part of a binding site may also be tandem repeated

34 Tandem Repeats in Model Need to modify the probabilistic model to include tandem repeats Research topic

Download ppt "Repeats in the Genome Lecture 11/2. Repeats in the genome Interspersed repeats Tandem repeats –Microsatellites –Minisatellites –Satellites."

Similar presentations

Ads by Google