Download presentation

Presentation is loading. Please wait.

Published byChristopher Starkey Modified over 2 years ago

1
Improved Models and Algorithms for Universal DNA Tag Systems continued … a.k.a. what did we do?

2
Nucleation Model When do two tags form a match? 1.sum of score of matches ≥ c ? (not stable complex!) 2.score of heaviest match ≥ c ? (as in [BKSY]) 3.score of heaviest match with e errors ≥ c ! (we propose) AAGCTGCA ACCCTGTA AAGCTGCA ACCCTGTA AAGCTGCA ACCCTGTA AAGCTGCA ACCCTGTA

3
Score of a single match (recap) May be computed via either of 2-4 Rule – easy approximation: A-T = 2, G-C = 4 – sum gives melting temperature Nearest Neighbor Rule – sum energies due to contiguous A-T & C-G pairs – A-T different from T-A different from A-G etc..

4
It’s an improvement.[BKSY] would predict We predict mfold predicts Is this a realistic model ? CGTAGCACGAA AACTCGTATCA CGTAGCACGAA AACTCGTATCA ACAGCAATGGA GATCGGTACTA ACAGCAATGGA GATCGGTACTA > < T m = 3.2°C T m = 13.8°C (6,0) match(9,1) match

5
Definitions Two strings s 1 and s 2 have a (c,e)-match if they have substrings t 1 and t 2 such that: 1.w(t 1 ) = w(t 2 ) ≥ c 2.t 1 and t 2 differ in ≤ e places A tag system is an (h,c,e)-code if 1.every tag has weight atleast h 2.no two tags have a (c,e)-match

6
Design of (h,c,e)-code with large size Outline of Upper Bound on size How? Via upper bound on number of c-tokens (the substrings t that have weight ≈ c) Choosing one c-token in a tag knocks out a sphere of nearby c-tokens from further use in any other tag. Similar to sphere packing bound in coding theory. Algorithms for generating optimal codes Modify alphabetic tree-search algorithm of [MPT]

7
c-tokens (recap) strings with weight ≥ c no proper suffix of weight ≥ c have weight either c or c+1 length ranges from c (all C/G) to 2c (all A/T) can’t use tailweight method of [BKSY] nucleation complexes nucleation complexes = Two c-tokens differing in at most e symbols Two c-tokens differing in at most e symbols

8
A sphere around CGCA C G CA is a 6-token of weight 7, length 4 how many 4-length codewords at distance 1? TGCA·GGCA AGCA CACA CCCA·CTCA CGGA CGTA CGAA CGCC CGCT CGCG

9
How many such spheres pack the whole space ? Now look at spheres around codewords of optimum code vol(s) total number of c-tokens s a red sphere ≤ must be disjoint ! size of code × vol(sphere) total number of c-tokens ≤

10
Size of a sphere Suppose string s has a A/T and b C/G symbols weight = a + 2b, length = a + b Introduce e errors into s to get t weight of t same as weight of s, so e1 = e2 for errors of type 1, pick inways and options to change to REPLACEWEIGHTNUMBER A → G, A → C, T → G, T → C +1e1 G → A, C → A, G → T, C → T e2 A → T, T → A, C → G, G → C 0e3

11
One tag of weight h uses (h-c+1) tokens So size of code ≤ Size of sphere = Substitute a = 2 l – c and b = c - l l varies from c/2 to c, c-tokens of weight c or c+1 = number of strings of length l =

12
Can tighten the bound further our sphere knocked out only c-tokens of the same length we should also remove similar c-tokens of other lengths.. reduce bound by factor e ? In comparison to [BKSY] bound h = 30, c = 12, e = 0: 13840 ≥ #tags ≥ 12000 h = 30, c = 12, e = 2: #tags ≤ 1268 if nucleation does occur with errors then we can’t assume so many tags

13
Plot of upper bound vs. c,e (h = 50) upper bound on number of codewords e – number of errors c – weight of nucleation complex

14
Open Problems & Remarks design, analyze efficient algorithms for model can we use random deBruijn sequences to generate codewords ? analyze using mixing techniques on Markov chain of [KMUW] ? exciting new question for coding theory: alphabets with weighted Hamming distances!

Similar presentations

OK

A Study of Residue Correlation within Protein Sequences and its Application to Sequence Classification Christopher Hemmerich Advisor: Dr. Sun Kim.

A Study of Residue Correlation within Protein Sequences and its Application to Sequence Classification Christopher Hemmerich Advisor: Dr. Sun Kim.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on earth movements and major landforms in france Free ppt on india size and location Ppt on review writing sample Ppt on programmable logic array examples Ppt on class 9 motion powerpoint Ppt on cloud computing security issues in infrastructure as a service Ppt on operating system of mobile Ppt on red mud concrete Ppt on question tags test Ppt on speed control of dc shunt motor