Presentation is loading. Please wait.

Presentation is loading. Please wait.

1. 2 Talk overview Overall project scenario PriFi motivation PriFi algorithm description Web version Demo.

Similar presentations


Presentation on theme: "1. 2 Talk overview Overall project scenario PriFi motivation PriFi algorithm description Web version Demo."— Presentation transcript:

1 1

2 2 Talk overview Overall project scenario PriFi motivation PriFi algorithm description Web version Demo

3 3 Overall project aim Development of general molecular markers for legume genetics primers for the markers

4 4 CATS – comparative anchor tagged sequences Alignment of ESTs from multiple legume species Align to genomic region Intron Identification of evolutionarily conserved regions Design of primers for PCR amplification of intron in mapping parents (hope to find polymorphism).

5 5 Copyright ©2004 by the National Academy of Sciences Choi, Hong-Kyu et al. (2004) Proc. Natl. Acad. Sci. USA 101, 15289-15294, Doyle & Luckow 2003 18.000 species  General legume markers would be very useful! Legume Taxonomy Genome Arachis

6 6 Looking for conserved regions exon intron exon intron Lotus genome AGC..AT CGAT..GGAC AGT..TGTAC..CCCAC..AT GGAGGAGGAC..TAAGAGAC CTAAAC..TCTCTAG TAC..CCCAC..AT AGC..AT GGG..AATAC..CCCAC..AT TAC..CC CAC..AT Glycine EST Medicago EST Phaseolus?

7 7 Primer design introns replaced by X'es to help Clustal Good marker region Usual method: Visual inspection of alignment  "manual" design of primers. Idea: automate primer design through computer program. Primer consensus sequences: Fw: TGCYTCAAAGGAGGAAATTTCAARAG Rv: CTGTCAAYACCAGTATTTGCCCKKG

8 8 Primer finder program goals Given alignment, program should find and rank primer pairs which: are placed in conserved regions, span an intron, have similar T m fulfill numerous criteria regarding AT content, primer length, ambiguity positions, product length,.. I.e. formalize intuition and experience of skilled lab researchers.

9 9 Lab practice Work method: go through numerous examples with lab people while they explain what they do and why. The "why" turned out to be difficult: Hard rules hard to formulate –"So T m must always be above 55°." –"Yes. Unless.. " Rules often contradictory –"But then the primer violates the AT content rule??" –"Oh, well, then the rule should be rephrased to.." Scoring primer pairs –"Why is this primer pair better than this one?" –"It just is!"

10 10 Primer finder program PriFi Works with alignment (or Fasta file which it aligns itself using Clustal). 1.Identifies conserved regions and locates introns 2.Identifies individual primer candidates –Checks most criteria 3.Considers pairs of primer candidates –Checks remaining criteria 4.Ranks all pairs 5.Suggests four pairs and explains their scores –Lets user make informed choice (discussions showed primer design is not exact science!).

11 11 Check all possibilities? To the algorithm, a primer pair is simply four indices (fw start, fw end, rv start, rv end ). For an alignment of length 1000, there are about 1.000.000.000.000 ways to pick four indices. Checking all possible four-tuples for all criteria is too slow. Algorithm applies three filters to reduce workload.

12 12 First filter Operates on the complete alignment. We only want primers in conserved regions: disregard less conserved regions. Delimit primer regions by masking out other columns. –Mask single-nucleotide columns. –Mask intron columns. –Mask "safety zone" around introns (to ensure unique identification of PCR product). –Mask certain mismatch columns.

13 13 First filter 1.Mask single-nucleotide columns. 2.Mask intron columns. 3.Mask "safety zone" around introns (to ensure unique identification of PCR product). 4.Mask certain mismatch columns. –Using two primer criteria: minimum length ( l ) and maximum number of ambiguities ( a ) –For each mismatch column: check if window of length l can be placed around it with at most a ambiguities. If not: mask column. –For l = 18, a = 4: For each mismatch column find window of length 18 containing at most 4 mismatch columns, otherwise mask. **!******!**!*******!******!!!*****!*!*******!*******!***!!**!*!!*!**!* CAGCATGCTGACGAAGCCTTGGACCGCCAXXXCAGGAATCAACCGTAGTGGAATCCAGCTAAGGCACACGGAT-- ----ATGCTGACGATGCCTTGGGCCGCCA---CAGGACTGAACCGTAATGGAATCTAGCTAAGGCTTACGGAT-- --GCGTGCTGATGAAGCCTTGGACCGCCA---CAGGAATCAACCGTAGTGGAATCCAGCCGAGCCACATGGCTAC

14 14 First filter 1.Mask single-nucleotide columns. 2.Mask intron columns. 3.Mask "safety zone" around introns (to ensure unique identification of PCR product). 4.Mask certain mismatch columns. –Using two primer criteria: minimum length ( l) and maximum number of ambiguities ( a) –For each mismatch column: check if window of length l can be placed around it with at most a ambiguities. If not: mask column. –For l = 18, a = 4: For each mismatch column find window of length 18 containing at most 4 mismatch columns, otherwise mask. **!******!**!*******!******!!!*****!*!*******!*******!***!!**!*!!*!**!* CAGCATGCTGACGAAGCCTTGGACCGCCAXXXCAGGAATCAACCGTAGTGGAATCCAGCTAAGGCACACGGAT-- ----ATGCTGACGATGCCTTGGGCCGCCA---CAGGACTGAACCGTAATGGAATCTAGCTAAGGCTTACGGAT-- --GCGTGCTGATGAAGCCTTGGACCGCCA---CAGGAATCAACCGTAGTGGAATCCAGCCGAGCCACATGGCTAC

15 15 First filter 1.Mask single-nucleotide columns. 2.Mask intron columns. 3.Mask "safety zone" around introns (to ensure unique identification of PCR product). 4.Mask certain mismatch columns. –Using two primer criteria: minimum length ( l) and maximum number of ambiguities ( a) –For each mismatch column: check if window of length l can be placed around it with at most a ambiguities. If not: mask column. –For l = 18, a = 4: For each mismatch column find window of length 18 containing at most 4 mismatch columns, otherwise mask. **!******!**!*******!******!!!*****!*!*******!*******!***!!**!*!!*!**!* CAGCATGCTGACGAAGCCTTGGACCGCCAXXXCAGGAATCAACCGTAGTGGAATCCAGCTAAGGCACACGGAT-- ----ATGCTGACGATGCCTTGGGCCGCCA---CAGGACTGAACCGTAATGGAATCTAGCTAAGGCTTACGGAT-- --GCGTGCTGATGAAGCCTTGGACCGCCA---CAGGAATCAACCGTAGTGGAATCCAGCCGAGCCACATGGCTAC

16 16 First filter 1.Mask single-nucleotide columns. 2.Mask intron columns. 3.Mask "safety zone" around introns (to ensure unique identification of PCR product). 4.Mask certain mismatch columns. –Using two primer criteria: minimum length ( l) and maximum number of ambiguities ( a) –For each mismatch column: check if window of length l can be placed around it with at most a ambiguities. If not: mask column. –For l = 18, a = 4: For each mismatch column find window of length 18 containing at most 4 mismatch columns, otherwise mask. **!******!**!*******!******!!!*****!*!*******!*******!***!!**!*!!*!**!* CAGCATGCTGACGAAGCCTTGGACCGCCAXXXCAGGAATCAACCGTAGTGGAATCCAGCTAAGGCACACGGAT-- ----ATGCTGACGATGCCTTGGGCCGCCA---CAGGACTGAACCGTAATGGAATCTAGCTAAGGCTTACGGAT-- --GCGTGCTGATGAAGCCTTGGACCGCCA---CAGGAATCAACCGTAGTGGAATCCAGCCGAGCCACATGGCTAC 5. Keep regions of length at least l with no masked columns.

17 17 Workload reduction Alignment length 43, min primer length 18, max primer length 35: 9 primer candidates of length 35, 10 of length 34, 11 of length 33, etc.; a total of 315. If middle column is masked we get two primer regions of length 21: one primer candidate of length 21, two of length 20, etc., 10 in each region. A total of 20.

18 18 Second filter Operates on single primer candidates. Can't check all criteria: primers don't have an orientation yet. Checks and scores primers according to relevant criteria like: –end in ambiguities? –Tm,–Tm, –have too many ambiguities? Prunes the set of remaining primers: –From any group of essentially identical, greatly overlapping primers, keep only the superior "representatives". –I.e. if two primers A and B overlap by more than 10 nt, and A scores better than B in all criteria, keep only A. Otherwise keep both. –This step is a major algorithm speed-up!

19 19 Example –End in ambiguities? –Tm–Tm –Have too many ambiguities? –Pruning !!**!*!!***!**!*******!************************!***************!*******!**! CAGCATGTTGACGAAGCCTTGGACCGCCAGCCCAGGAATCAACCGTAGTGGAATCCAGCTAAGGCACACGGATAG ATGCATACTGACGATGCCTTGGGCCGCCAGCCCAGGAATCAACCGTAATGGAATCCAGCTAAGGCACACGGATAG CAGCGTGCTGATGAAGCCTTGGACCGCCAGCCCAGGAATCAACCGTAGTGGAATCCAGCTAAGCCACACGGCTAC __________________________ ___________________________ __________________________ ___________________________ __________________________ ___________________________ __________________________________ _______________________________ ___________________________ ____________________________ (13  1)

20 20 Third filter Operates on primer pairs. Considers all combinations of two primer candidates (low number of candidates essential!). Checks remaining criteria, such as: –AT content and degeneracy in 3'-tail, –distance to closest intron (to ensure identification of PCR product), –PCR product length, –similar T m 's. Discards invalid pairs, scores and ranks the rest. Suggests four pairs. –Best scoring, not-too-overlapping pairs. Ensures some variation.

21 21 Report Fw 5'-ATCCGATTTCGAGAAATGCAAACCCTGGTTGATCC Rv 5'-CCCTTCACAGTGGTGATACACTTTCGCTTGTTACG Tm = 66.4 / 66.9 Primer lengths: 35 / 35 Avg. #sequences in primer alignments: 3.0 / 2.0 Estimated product length: 1785 Primer/intron distances: 36 / 88 A/T's among last 8 bp of 3'-end: 4 / 5 Ambiguities: 0 / 0 93.2: High-Tm bonus 6.0: Fw primer length 6.0: Rv primer length 24.7: bonus for #sequences in primer alignments 3.0: Fw has G/C terminal in 3'-end 3.0: Rv has G/C terminal in 3'-end 60.0: Good product length -5.0: Rv in unconserved region or based mostly on 2 seqs -11.3: Primer/intron distance(s) outside 70-150 bp -3.0: Too high AT content in 3'-ends Score: 176

22 22 PriFi on the web

23 23

24 24 Configuration Critical melting temperature If both primer melting temperatures are below this value, penalize the pair. Minimum melting temperature with ambiguity positions If a primer melting temperature is below this value, the primer can have no ambiguity positions. Optimal PCR product length interval Penalty Ok Optimal Ok Penalty Critical ambiguity position distance from 3'-end Penalize ambiguity positions closer than this distance in nucleotides to the 3'-end. p1p1 p2p2 p3p3 p4p4 PCR prod len points Introns in sequences If set to 'no', primer pairs do not have to span an intron (and introns are not marked by X'es). Somewhat heuristic parameters and rules..

25 25 Status Genomic data from Medicago and Lotus, ESTs from Medicago, Lotus, Glycine, Arachis, Phaseolus. PriFi found primer pairs for 203 alignments. 36 primer pairs tested: –24 correct products in Phaseolus. –19 in Arachis. –Rest not polymorphic or yielded no product.

26 26 User statistics

27 27

28 28 Thanks for your attention. Email: jakobf@birc.au.dkjakobf@birc.au.dk Website: http://cgi-daimi.au.dk/cgi-chili/PriFi/main People involved in developing PriFi: Leif Schauser (BiRC), Lene H. Madsen, Niels Sandal (Dept. of Mol. Biology). Grant holder: Jens Stougaard.


Download ppt "1. 2 Talk overview Overall project scenario PriFi motivation PriFi algorithm description Web version Demo."

Similar presentations


Ads by Google