Presentation is loading. Please wait.

Presentation is loading. Please wait.

Thursday, 5 June 2008 Problems in sequence analysis Identification by sequence similarity Genes Determining Plant-Cyanobacterial Symbioses and Consideration.

Similar presentations


Presentation on theme: "Thursday, 5 June 2008 Problems in sequence analysis Identification by sequence similarity Genes Determining Plant-Cyanobacterial Symbioses and Consideration."— Presentation transcript:

1 Thursday, 5 June 2008 Problems in sequence analysis Identification by sequence similarity Genes Determining Plant-Cyanobacterial Symbioses and Consideration of Blast This demonstration is best viewed as a slide show, enabling you to simulate a session and make changes in cursor position more obvious. To do this, click Slide Show on the top tool bar, then View show. Click anywhere to go on to the next slide

2 10 mM nitrate0.1 mM nitrate Gland development is stimulated by N-limitation What's special about the gland? Gland suppressed by presence of fixed N Plant starved for N makes gland to house cyanobacteria What genes are specifically expressed in glands?

3 Construction of a cDNA library from Gunnera gland mRNA ends with polyA tails Use modified polyT to direct synthesis of DNA copy of mRNA Reverse Transcriptase (RT) adds CCC to end. Add 2 nd adapter, using GGG to attach to CCC. Extend cDNA

4 Construction of a cDNA library from Gunnera gland (Same protocol, but with real sequences) 5'-NNNNNNNNNN... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3' 3'-TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5' Use modified polyT adapter to direct synthesis of DNA copy of mRNA

5 Construction of a cDNA library from Gunnera gland 5'-NNNNNNNNNN... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3' 3'-TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5' Use modified polyT adapter to direct synthesis of DNA copy of mRNA The adapter can bind to many positions in polyA tail, resulting in variation in number of T's in cDNA sequence.

6 Construction of a cDNA library from Gunnera gland 5'-NNNNNNNNNN... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3' 3'-TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5' Use modified polyT adapter to direct synthesis of DNA copy of mRNA The adapter can bind to many positions in polyA tail, resulting in variation in number of T's in cDNA sequence.

7 Construction of a cDNA library from Gunnera gland 5'-NNNNNNNNNN... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3' TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5' 3'-CCCNNNNNNNNNN... NNNNNNNNNN Reverse Transcriptase (RT) extends the adapter to the end of the mRNA and adds CCC to the 3' end.

8 3'-CCCNNNNNNNNNN... Construction of a cDNA library from Gunnera gland 5'-NNNNNNNNNN... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3' TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5' 3'-CCCNNNNNNNNNN... NNNNNNNNNN 5'-AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGG A second adapter is added which (with the help of antibodies to) uses three G's to bind to the three.C's.

9 CCCNNNNNNNNNN... Construction of a cDNA library from Gunnera gland 5'-NNNNNNNNNN... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3' TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5' 3'-CCCNNNNNNNNNN... NNNNNNNNNN 5'-AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGG The cDNA sequence is extended to the left, using the second adapter as a template. TTCGTCACCATAGTTGCGTCTCACCGGTAATGCCGG

10 CCCNNNNNNNNNN... Construction of a cDNA library from Gunnera gland 5'-NNNNNNNNNN... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3' TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5' 3'-CCCNNNNNNNNNN... NNNNNNNNNN 5'-AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGNNNNNNNNNN... The cDNA sequence is extended to the left, using the second adapter as a template… …and then the second cDNA is strand is synthesized left-to-right, using the first cDNA strand as the template. TTCGTCACCATAGTTGCGTCTCACCGGTAATGCCGG

11 CCCNNNNNNNNNN... Construction of a cDNA library from Gunnera gland 5'-NNNNNNNNNN... NNNNNNNNNNAAAAAAAAAAAAAAAAA...-3' TTTTTCTTTTTTCATGGCTGACGCTGAGACGCAACTATGGTGACGAA-5' 3'-CCCNNNNNNNNNN... NNNNNNNNNN 5'-AAGCAGTGGTATCAACGCAGAGTGGCCATTACGGCCGGGNNNNNNNNNN... TTCGTCACCATAGTTGCGTCTCACCGGTAATGCCGG Hundreds to thousands of nucleotides To give some perspective, the adapters are about 50 nucleotides, while the mRNA itself can be as large as a couple of thousands of nucleotides.

12 Construction of a cDNA library from Gunnera gland Of course there are thousands of different mRNA's in a cell, leading to thousands of cDNA's in the library, all in multiple copies.

13 Sequencing of cDNA library Limitations: - Only from ends - Only ~400 nt It would be nice to be able to sequence the cDNA's from end to end, but that's not presently possible. Sequencing has its limitations.

14 Sequencing of cDNA library Limitations: - Only from ends - Only ~400 nt Solution: - Break the cDNA The solution is to break up the cDNA so that there are multiple, overlapping ends from which to sequence. In this way, all the full length of the cDNA can be sequenced

15 Sequencing of cDNA library (1000's of cDNA's) The broken fragments are read from either end (at random). If there are enough reads, it is possible to use overlaps to reassemble the original sequence. Unfortunately, the adapters are also sequenced, and these complicate the assembly process, as they're interpreted as overlapping sequences, leading to misassembly. They need to be removed.

16 Sequencing of cDNA library (1000's of cDNA's) Given the number of sequences, the removal process obviously must be automated, but automated processes, while fast, are often stupid. We need to check to make sure they worked.

17 Identifying elements of cDNA library The assembly process should, in theory, also remove duplicate sequences.

18 Identifying elements of cDNA library The assembly process should, in theory, also remove duplicate sequences. In practice, partial duplicates may remain, and it is necessary to keep an eye out for them.

19 Identifying elements of cDNA library Predict function directly from sequence How to go from cDNA sequence to predicted function for the sequences? You might think that since we can readily predict a protein sequence from a DNA sequence, it should be possible to predict function as well.

20 Identifying elements of cDNA library Predict function directly from sequence Predict function from sequence similarity Nope. At present that's impossible. The best we can do is to compare sequences with sequences from other organisms where there is experimental evidence as to function.

21 Identifying elements of cDNA library Predict function directly from sequence Predict function from sequence similarity Blast is a tool to do just that, comparing a given sequence against at database of known sequences. It is important to understand the mind of Blast. But that is a subject for another time.

22 Genes Determining Plant-Cyanobacterial Symbioses and Consideration of Blast 1. Determine if primers been removed from sequences. 2. Determine if the library contains duplicates 3. Identify protein sequences similar to those encoded by cDNAs We've identified many things that need to be done: 4. (plus one extra) Find where in the cDNAs genes begin and end

23 Genes Determining Plant-Cyanobacterial Symbioses and Consideration of Blast Go into StaphyloBIKE through the BioBIKE portal (Gunnera isn't a member of the Staphylococcus, of course, but I put the cDNA sequences in that instance of BioBIKE) RUN-FILE "contig-resources.bike" SHARED (this makes the cDNA sequences available to you as a variable called gunnera-contigs and also provides you with a possibly useful tool READ-NAMED to extract specific sequences) These questions are ordinarily answered by high-powered computer types. But you can answer them yourself. First you need to read in the data.

24 Genes Determining Plant-Cyanobacterial Symbioses and Consideration of Blast SEQUENCE-SIMILAR-TO Accesses BLAST, using as targets either internal data (i.e. gunnera-contigs ) or external data (i.e. *GENBANK* ) Also used to look for nearly identical sequences, using the MISMATCHES option. READING-FRAMES-OF Translates the sequence in all six possible reading frames. Possibly useful functions:


Download ppt "Thursday, 5 June 2008 Problems in sequence analysis Identification by sequence similarity Genes Determining Plant-Cyanobacterial Symbioses and Consideration."

Similar presentations


Ads by Google