Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Bioinformatics Research Center. 2 Talk overview 1.DNA and genes 2.Project idea 3.PriFi – finding primers based on a multiple alignment 4.GeMprospector.

Similar presentations


Presentation on theme: "1 Bioinformatics Research Center. 2 Talk overview 1.DNA and genes 2.Project idea 3.PriFi – finding primers based on a multiple alignment 4.GeMprospector."— Presentation transcript:

1 1 Bioinformatics Research Center

2 2 Talk overview 1.DNA and genes 2.Project idea 3.PriFi – finding primers based on a multiple alignment 4.GeMprospector Collaborators: Leif Schauser, ex-BiRC, and Jens Stougaard’s group at Mol. Biol.

3 3 Heredity Twins are living proof that offspring inherit certain features from their parents http://www.stantontwins.com/ Don and Dan Stanton star in Terminator 2, Good Morning Vietnam, a.o.

4 4 DNA All living cells store hereditary information as double-stranded DeoxyriboNucleic Acid, DNA Strands are long complementary sequences of nucleotides Adenine, Cytosine, Guanine, Thymine Base pairs : AGCTTAGCCGA ||||||||||| TCGAATCGGCT

5 5 Gene DNA segment which codes for (is translated into) a protein * http://stemcells.nih.gov/StaticResources/info/scireport/images/figurea6.jpg

6 6 Gene structure * Promoters Untranslated regions Introns Exons Figure © Wellcome Trust EST : sequenced mRNA (no introns) Alleles : different variants of same gene Terminology:

7 7 Genome Collection of all DNA of an organism Each cell has copy of complete genome Human genome is long : If each base pair was scaled to 1 mm width.... the human genome would be 3200 km long,.. gene every 300 m,.. gene 30 m long, exon length only 1 m Figure © 2002 by Bruce Alberts, Alexander Johnson, Julian Lewis, Martin Raff, Keith Roberts, and Peter Walter.

8 8 Genome tightly packed Real scale: cell nucleus 2-3 μ, genome 2 m Pack 12 km fishing line into die

9 9 Plant breeding Goal: combine several good traits in same line (many fruits, resistance to disease, etc.) [Cross plants with good traits and hope offspring has all]

10 10 Plant breeding Takes time, effort –Trait possibly only expressed under certain conditions –Difficult to test for trait –Late onset, etc. Alternative: genetic markers. Saves time and space: –Do DNA testing –Only need baby plants

11 11 Genetic marker Genome Specific but variable piece of DNA known to reside in one and only one location in the genome CGACTAGCAATGCTACA(G/C)AGGATCCCCGCGAC (Unknown) gene for desired trait must be close Marker If one marker allele is linked to a desired trait, the marker may serve as indicator Genome

12 12 PCR Good thing about markers: Easy to test for. 1.Get DNA sample 2.Add primers chemically fabricated to bind specifically to the marker DNA 3.Do Polymerase Chain Reaction (PCR): Produces ”missing part” between primers 4.Sequence product

13 13 Overall project aim Development of.. General genetic markers for legumes –many markers  greater chance of trait association –general: each marker should be shared by all legumes PCR primers for the markers –One set per marker, should work for all legumes

14 14 Idea Align sequences from multiple legumes Look for conserved regions flanking introns (Recall goal: Specific but variable DNA, universal across legumes) We’ll get to that

15 15 Idea Alignment of legume ESTs Align to genomic region (introns, specificity) Intron Find evolutionarily conserved regions Design (hopefully legume-universal) primers for PCR amplification of intron in mapping parents (hope to find polymorphism ).

16 16 Example Aligning ESTs with genomic sequence Very probable structure Primers in conserved regions will amplify intron in all legumes (?) Good potential marker region! exon intron exon intron Lotus genome Glycine EST Medicago EST Peanut AGCATCGATCAGGACGGGAATACCCCACATGGAGGAGGAGGACCTAACAATAAGAGACCTAAACTCTCTCTAG TACCCCACAT AGCATGGGAA TACCCCACAT TACCCCACAT

17 17 Choi, Hong-Kyu et al. (2004) Proc. Natl. Acad. Sci. USA 101, 15289-15294, Doyle & Luckow (2003) © 2004 the National Academy of Sciences 18.000 species  General legume markers would be very useful! Genomic data Arachis Genomic data Legume taxonomy

18 18 Regarding marker specificity We don't have a complete legume genome If incomplete genome is used.. –Marker candidate sequence may also occur in unknown part of genome and hence also in other legumes –Trait gene and marker have to be close to get strong allele–trait association Use complete Arabidopsis thaliana genome instead Genomic regions that haven't been sequenced ? ACGCATCGATTCGCGAACTG Trait gene

19 19 Arabidopsis and legumes If Arabidopsis has 2 copies of some gene, its legume ortholog probably exists in only 1 copy Legume EST has 1 or 2 hits in Arabidopsis..  probably unique in legumes  potential marker candidate Arabidopsis Legumes whole genome duplication

20 20 Large-scale: Legume pipeline Variation Specificity Universality

21 21 introns replaced by X'es Good marker region Usual method: Visual inspection of alignment  “manual” design of primers. Idea: automate primer design through computer program Primer consensus sequences: Fw: TGCYTCAAAGGAGGAAATTTCAARAG Rv: CTGTCAAYACCAGTATTTGCCCKKG Primer design

22 22 Given alignment, find and rank primer pairs which: are placed in conserved regions span an intron meet numerous criteria (AT content, primer and product length, ambiguity positions, similar Tm,.. ) I.e. formalize intuition and experience of skilled lab researchers Primer program goals

23 23 Work method Go through numerous examples with lab people while they explain what they do and why. The "why" turned out to be difficult: Hard rules hard to formulate –"So T m must always be above 55°." –"Yes. Unless.. " Rules often contradictory –"But then the primer violates the AT content rule??" –"Oh, well, then the rule should be rephrased to.." Scoring primer pairs –"Why is this primer pair better than this one?" –"It just is!"

24 24 Primer finder program PriFi Works on alignment 1.Identifies conserved regions and locates introns 2.Identifies individual primer candidates –Checks most criteria 3.Considers pairs of primer candidates –Checks remaining criteria 4.Ranks all pairs 5.Suggests four pairs and explains their scores –Lets user make informed choice (discussions showed primer design is not exact science!).

25 25 Check all possibilities? To the algorithm, a primer pair is simply four positions (fw start, fw end, rv start, rv end ). For an alignment of length 1000, there are about 1.000.000.000.000 ways to pick four positions. Checking all possible four-tuples for all criteria is too slow. Algorithm applies three filters to reduce workload.

26 26 PriFi on the web

27 27

28 28 Useful! 40 uses per day from around the globe Google Analytics

29 29 Results 459 marker candidates/primer sets identified 36 primer pairs tested: –24 correct products in Phaseolus (bean) –19 in Arachis (peanut) –Rest not polymorphic or yielded no product.

30 30 GeM prospector

31 31 Our pipeline

32 32 Online pipeline Extend our alignment database with new DNA from user Find new markers?

33 33.. what was the middle thing?? Find words that occur once in the genome, but with different spellings among different peanut plants: Peanut genome: 100 phone books (2 billion nucleotides) “serendipity” Peanut plant A “serendipiti” Peanut plant B

34 34.. what was the middle thing?? Find words that occur once in the genome, but with different spellings among different peanut plants: Peanut genome: 100 phone books (2 billion nucleotides) “serendipity” Peanut plant A “serendipiti” Peanut plant B Bean plant A Bean plant B “serendipity”“serandipity” legume’s

35 35 Email: jakobf@birc.au.dkjakobf@birc.au.dk http://cgi-daimi.au.dk/cgi-chili/PriFi/main http://cgi-www.daimi.au.dk/cgi-chili/GeMprospector/main Jakob Fredslund, Lene H. Madsen, Birgit K. Hougaard, Anna Marie Nielsen, David Bertioli, Niels Sandal, Jens Stougaard, Leif Schauser A general pipeline for the development of anchor markers for comparative genomics in plants BMC Genomics 2006, 7:207 Jakob Fredslund, Lene H. Madsen, Birgit K. Hougaard, Niels Sandal, Jens Stougaard, David Bertioli, Leif Schauser GeMprospector - Online Design of Cross-Species Genetic Marker Candidates in Legumes and Grasses Nucleic Acids Research 2006 34 (Web Server issue): W670-W675 Jakob Fredslund, Leif Schauser, Lene Heegard Madsen, Niels Sandal, Jens Stougaard, PriFi - Using a Multiple Alignment of Related Sequences to Find Primers for Amplification of Homologs Nucleic Acids Research 2005 33(Web Server issue):W516-W520; doi:10.1093/nar/gki425 Leif Schauser, Jakob Fredslund, Lene Heegard Madsen, Niels Sandal, Jens Stougaard, A computational pipeline towards the development of comparative anchor tagged sequence (CATS) markers In Proceedings of the International Grassland Congress (IGC), 2005. In Humphrey MO (Ed) Wageningen Academic Publishers, pp. 73-81. People involved in developing PriFi: Leif Schauser (BiRC), Lene H. Madsen, Niels Sandal (Dept. of Mol. Biology). Grant holder: Jens Stougaard.


Download ppt "1 Bioinformatics Research Center. 2 Talk overview 1.DNA and genes 2.Project idea 3.PriFi – finding primers based on a multiple alignment 4.GeMprospector."

Similar presentations


Ads by Google