Presentation is loading. Please wait.

Presentation is loading. Please wait.

How can we find genes? Search for them Look them up.

Similar presentations


Presentation on theme: "How can we find genes? Search for them Look them up."— Presentation transcript:

1 How can we find genes? Search for them Look them up

2 How do I get from this… >mouse_ear_cress_1080 GAAATAATCAATGGAATATGTAGAGGTCTCCTGTACCTTCACAGAGATTCTAGGCTGAGAGCAGTGCATATAGATATCTTT CGTACTCATCTGCTTTTTCTGGTCTCCATCACAAAAGCCAACTAGGTAATCATATCAATCTCTCTTTACCGTTTACTCGAC CTTTTCCAATCAGGTGCT TCTGGTGTGTCTACTACTATCAGTTTTAGGTCTTTGTATACCTGATCTTATCTGCTACTG AGGCTTGTAAAAGTGATTAAAACTGTGACATTTACTCTAAGAGAAGTAACCTGTTTGATGCATTTCCCTAATATACCGGTG TGGAAAAGTGTAGGTATCTGTACTCAGCTGAAATGGTGGACGATTTTGAAGAAGATGAACTCTCATTGACTGAAAGCGGGT TGAAGAGTGAAGATGGCGTTATTATCGAGATGAATGTCTCCTGGATGCTTTTATTATCATGTTTGGGAATTTACCAAGGGA GAGGTATCAGAATCTATCTTAGAAGGTTACATTTAGCTCAAGCTTGCATCAACATCTTTACTTAGAGCTCTACGGGTTTTA GTGTGTTTGAAGTTTCTTAACTCCTAGTATAATTAGAATCTTCTGCAGCAGACTTTAGAGTTTTGGGATGTAGAGCTAACC AGAGTCGGTTTGTTTAAACTAGAATCTTTTTATGTAGCAGACTTGTTCAGTACCTGAATACCAGTTTTAAATTACCGTCAG ATGTTGATCTTGTTGGTAATAATGGAGAAACGGAAGAATAATTAGACGAAACAAACTCTTTAAGAACGTATCTTTCAGTTT TCCATCACAAATTTTCTTACAAGCTACAAAAATCGAACTATATATAACTGAACCGAATTTAAACCGGAGGGAGGGTTTGAC TTTGGTCAATCACATTTCCAATGATACCGTCGTTTGGTTTGGGGAAGCCTCGTCGTACAAATACGACGTCGTTTAAGGAAA GCCCTCCTTAACCCCAGTTATAAGCTCAAAGTTGTACTTGACCTTTTTAAAGAAGCACGAAACGAAAAACCCTAAAATTCC CAAGCAGAGAAAGAGAGACAGAGCAAGTACAGATTTCAACTAGCTCAAGATGATCATCCCTGTTCGTTGCTTTACTTGTGG AAAGGTTGATATTTTCCCCTTCGCTTTGGTCTTATTTAGGGTTTTACTCCGTCTTTATAGGGTTTTAGTTACTCCAAATTT GGCTAAGAAGAGATCTTTACTCTCTGTATTTGACACGAATGTTTTTAATCGGTTGGATACATGTTGGGTCGATTAGAGAAA TAAAGTATTGAGCTTTACTAAGCTTTCACCTTGTGATTGGTTTAGGTGATTGGAAACAAATGGGATCAGTATCTTGATCTT CTCCAGCTCGACTACACTGAAGGGTAAGCTTACAATGATTCTCACTTCTTGCTGCTCTAATCATCATACTTTGTGTCAAAA AGAGAGTAATTGCTTTGCGTTTTAGAGAAATTAGCCCAGATTTCGTATTGGGTCTGTGAAGTTTCATATTAGCTAACACAC TTCTCTAATTGATAACAGAAGCTATAAAATAGATTTGCTGATGAAGGAGTTAGCTTTTTATAATCTTCTGTGTTTGTGTTT TACTGTCTGTGTCATTGGAAGAGACTATGTCCTGCCTATATAATCTCTATGTGCCTATCTAGATTTTCTATACAATTGATA TTTGATAGAAGTAGAAAGTAAGACTTAAGGTCTTTTGATTAGACTTGTGCCCATCTACATGATTCTTATTGGACTAATCAT TCTTTGTGTGAAAATAGAATACTTTGTCTGAACATGAGAGAATGGTTCATAATACGTGTGAAGTATGGGATTAGTTCAACA ATTTCGCTATTGGAGAAGCAAACCAAGGGTTAATCGTTTATAGGGTTAAGCTAATGCTCTGCTCTTTATATGTTATTGGAA CAGACTATTGTTGTGCCTATCTTGTTTAGTTGTAGATTCTATCTCGACTGTTATAAGTATGACTGAAGGCTTGATGACTTA TGATTCTCTTTACACCTGTAGAAGGATTTAAGCTTGGTGTCTAGATATTCAATCTGTGTTGGTTTTGTCTTTCTTTTGGCT CTTAGTGTTGTTCAATCTCCTCAATAGGTATGAAGTTACAATATCCTTATTATTTTGCAGGGACGCACTTGATGCACTCCA GCTAGTCAGATACTGCTGCAGGCGTATGCTAATGACCTTGCATCAACATCTTTACTTAGAGCTCTACGGGTTTTAGTGTGT

3 …to this?

4 Meaning?

5 Mathematical Tools (Code; statistics)

6 Comparative Tools (Database searches)

7 What do we know about genes? Expressed (Transcribed) – Transcriptional start & termination sites (TXSS, TXTS) – Transcription artefacts (cDNA & ESTs) Regulated – Promoters (TATAAA) – Transcription Factor Binding Sites – CpG (Cytosin methylation) Meaningful (Translated) – 3n basepairs – Codon usage – Translational start & stop/termination codons (TLSS, TLTS) – Translation artefacts (proteins) Spliced – Splice sites (GT-AG) Derived (Homology: Paralogy/Orthology) – Search for known genes, proteins (BLAST)

8 How might this knowledge help to find genes? Predict genes – Look for potential starts and stops. – Connect them into open reading frames (ORFs). – Filter for “correct’ length & codon usage. Search databases – Known genes: UniGene – Known proteins: UniProt Use transcript evidence – cDNA – ESTs – proteins

9 Operating computationally Go to beginning of sequence  start SCAN If ATG  register putative TLSS; then – Move in 3-steps & count steps (=COUNTS) – If 3-step = (TAA or TAG or TGA),  register putative TLTS – If register  evaluate COUNTS (= triplets) If COUNTS < minimum  discard; then go behind ATG above and start SCAN If COUNTS > maximum  discard; then go behind ATG above and start SCAN If minimum < COUNTS < maximum  record as GENE with TLSS, TLTS; then go behind ATG above and start SCAN. Arrive at end of sequence  stop SCAN

10 Find gene families Mathematical evidence Analyze large data sets Browse in ccontext Construct gene models Annotation workflow Biological evidence Browse results Get/Generate sequence

11 Annotation Cheat Sheet Open existing project or generate new (Red square) Run RepeatMasker Generate evidence (Predictions, BLAST searches) Synthesize evidence into gene models (Apollo) Browse results locally and in context (Phytozome) Conduct functional analysis (link from Browser) Prospect for gene family (Yellow Line from Browser) Select region that holds biological gene evidence Optimize work space and zoom to region (View tab) Expand all tiers (Tiers tab) Drag evidence item(s) onto workspace (mouse) Edit to match biol. evidence (right-click item for tools) Record what was done in Annotation Info Editor Assess necessity to build alternative model(s) Upload model(s) to DNA Subway (File tab) A. DNA Subway B. Apollo

12 Predictors (mathematical evidence) Utilize predominantly mathematical methods (statistical). Search for patterns –Some score starts, stops, splice sites (GenScan). –Some score nucleotides (Augustus, FGenesH). Few incorporate EST data and/or known genes/proteins. Require optimization for each new species (training). Accuracy: –False positives (scoring non-genes as genes):5% - 50%. –False negatives (missed genes): 5%-40%. –Weak or unable in determining first and last exons, and UTRs. Specific for gene models (spliced genes, non-spliced genes). Specialty predictors (tRNA Scan, RepeatMasker).

13 Search tools (biological evidence) Search sequence (molecules; tangible) databases: –Known genes –Known proteins –cDNAs & ESTs Utilize alignment methods (BLAST, BLAT). Reliability: –Good in determining gene locations and general gene structures. –Weak in exactly determining exon/intron borders. –Unlikely to correctly determine TXSS and TXTS. –Should be used with cDNA/EST from same species as genome.

14 Sequence & course material repository http://gfx.dnalc.org/files/evidence Don’t open items, save them to your computer!! Annotation (sequences & evidence) Manuals (DNA, Subway, Apollo, JalView) Presentations (.ppt files) Prospecting (sequences) Readings (Bioinformatics tools, splicing, etc.) Worksheets (Word docs, handouts, etc.) BCR-ABL (temporary; not course-related)


Download ppt "How can we find genes? Search for them Look them up."

Similar presentations


Ads by Google