Presentation is loading. Please wait.

Presentation is loading. Please wait.

Xt ESTs 32,000 unique transcript set –16,000 clusters –16,000 singletons Clusters –9,000 (55%) have a blastx hit –4,000 might be full-length –2,000 ~98%

Similar presentations


Presentation on theme: "Xt ESTs 32,000 unique transcript set –16,000 clusters –16,000 singletons Clusters –9,000 (55%) have a blastx hit –4,000 might be full-length –2,000 ~98%"— Presentation transcript:

1 Xt ESTs 32,000 unique transcript set –16,000 clusters –16,000 singletons Clusters –9,000 (55%) have a blastx hit –4,000 might be full-length –2,000 ~98% probability of being FL Singletons –5,500 (35%) have a blastx hit –1,500 might be full-length –200 – 500 ‘probably’ FL

2 What are we looking for? FL perfect –good enough to spend £500 on a morphelino FL probable –likely enough for a gain of function expt Gene transcript –Good enough to put on an array For FL, distinguish between –knowing it’s full-length and –being sure of which ATG is the start

3 Looking for full-length transcripts Perfect full-length -Open reading frame -defined by clear prior stop codon -Clear ATG 3’ of STOP codon -Reasonable run of stop free sequence before another stop signal or end of ESTs -Consensus sequence agrees with ESTs -Blastx data -Blastx hits indicating coding sequence -Start of matching proteins exactly aligned with predicted start methionine -No other protein alignments consensus sequence CTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCGGCG PROTEIN Hs 1e-187 Gene name =================================================================================== PROTEIN Mm 1e-190 Gene name =================================================================================== PROTEIN Dr 1e-201 Gene name =================================================================================== PROTEIN Xl 1e-202 Gene name =================================================================================== GCTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCT CTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGC TCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGC AGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAG AGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGCGCTAT CTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCG TATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCGGCGCTATACG

4 Blast aligned with ATG Less perfect, but possible sufficient, indications of full- length 1. Blast hits line up with ATG -Perfect PROTEIN Hs 1e-187 Gene name =================================================================================== PROTEIN Mm 1e-190 Gene name =================================================================================== PROTEIN Dr 1e-201 Gene name =================================================================================== PROTEIN Xl 1e-202 Gene name =================================================================================== AGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCT CAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGC GAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGC GCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAG AGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGCGCTAT CTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCG TATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCGGCGCTATACG -Weak hits, maybe several agree PROTEIN Ce 8.2e-9 Gene name =================================================================================== AGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCT CAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGC GAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGC TATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCGGCGCTATACG -Strong hits but not clear agreement, predicted proteins confuse PROTEIN Hs 1e-187 Gene name =================================================================================== PROTEIN Mm 1e-190 Gene name =================================================================================== PREDICTED Dr 1e-201 Gene name =================================================================================== PROTEIN Xl 1e-202 Gene name ============================================================================= AGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCT CAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGC GAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGC GCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAG AGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGCGCTAT CTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCG TATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCGGCGCTA

5 Proteins alignments start within ORF 2. Proteins aligned within well-defined ORF PROTEIN Hs 1e-10 Gene name =================================================================================== PROTEIN Dr 1e-19 Gene name =================================================================================== FRAGMENT Dm 1e-19 Gene name =================================================================================== PREDICTED Mm 1e-50 Gene name =================================================================================== PROTEIN Dr 1e-87 Gene name ============================================================ GCTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCT CTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGC TCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGC AGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAG AGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGCGCTAT CTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCG TATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCGGCGCTATACG

6 Proteins alignments overlap ORF 3. Proteins aligned some part overlaps well-defined ORF Weak hits, indication of domain homology quite likely to be FL PROTEIN Hs 1e-4 Gene name ========================================================================================================================== PROTEIN Dr 1e-5 Gene name ========================================================================================================================= FRAGMENT Dm 1e-6 Gene name ===================================================================================================================== PREDICTED Mm 1e-8 Gene name ========================================================================================================================== PROTEIN Dr 1e-8 Gene name ================================================================================================================================ CTATATATATATATCGATCGCTTAGGCTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCT CTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGC TCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGCTATTATA AGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGCTATTATAGGC AGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGCTATTATAGGCT Strong hits, probabably real homolog, ORF may be artefact of sequencing error, or in UTR PROTEIN Hs 1e-81 Gene name ======================================================================================================== PROTEIN Dr 1e-98 Gene name =================================================================================================== PROTEIN Xl 1e-107 Gene name ================================================================================================= CTATATATATATATCGATCGCTTAGGCTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCT CTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGC TCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGCTATTATA AGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGCTATTATAGGC AGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTATGGCGTCTCTAGGATCTGCTTCGCTATTATAGGCT

7 Protein alignment has upstream STOP 4. There are protein alignments and a well-defined STOP codon upstream PROTEIN Hs 1e-187 Gene name ============================================================= PROTEIN Mm 1e-190 Gene name ================================================================ PROTEIN Dr 1e-201 Gene name ================================================================ PROTEIN Xl 1e-202 Gene name ================================================================ GCTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCT CTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGC -Mostly applicable to small clusters where codons are not well agreed

8 Long open reading frame…. 5. There is a long open reading frame, but maybe no blastx hits  -------------------------------------------------- more than 500 (?) -----------------------  GCTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCT CTTCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGC TCTTCTAGAGTCAGAGCGTCATGAGCTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGC CTTCTTCTATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCG TATTAGGATCGCTCGATTGCTAGGCTTAGCTGATGCGGGCTTCTTCTCGAGAGAAACTCGGATTAGCGGCTTCGCGTCTCTAGGATCTGCTTCGCTATTATAGGCTTCGGATTAGGCGCTATTATCGGCGCTATACG -May just be in UTR -plenty of long ORFs observed in obvious UTR -May not even be RNA… -what about blastn data? -ESTscan would also be useful


Download ppt "Xt ESTs 32,000 unique transcript set –16,000 clusters –16,000 singletons Clusters –9,000 (55%) have a blastx hit –4,000 might be full-length –2,000 ~98%"

Similar presentations


Ads by Google