WSSP Chapter 7 BLASTN: DNA vs DNA searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc.

Slides:



Advertisements
Similar presentations
Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Advertisements

Bioinformatics. Bioinformatics is an applied science that uses computer programs to access molecular biology databanks to make inferences about the information.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
Max BachourJessica Chen. Shotgun or 454 sequencing High throughput sequencing technique that can collect a large amount of data at a fast rate. Works.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Review of Laboratory 3 Spectrophotometric determination of DNA quantity, purity Abs 260 nmAbs 280 nmAbs 320 nmAbs 260/Abs
Sequence Analysis. Today How to retrieve a DNA sequence? How to search for other related DNA sequences? How to search for its protein sequence? How to.
Using BLAST options to refine a search 1)Address the question “how many of the Phytophthora/tomato interaction ESTs are tomato?” A: Will depend on conditions.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Arabidopsis Gene Project GK-12 April Workshop Karolyn Giang and Dr. Mulligan.
BLAT – The B LAST- L ike A lignment T ool Kent, W.J. Genome Res : Presenter: 巨彥霖 田知本.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
An Introduction to Bioinformatics
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Bikash Shakya Emma Lang Jorge Diaz.  BLASTx entire sequence against 9 plant genomes. RepeatMasker  55.47% repetitive sequences  82.5% retroelements.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
WSSP Chapter 7 BLASTN: DNA vs DNA searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Copyright OpenHelix. No use or reproduction without express written consent1.
Searching Molecular Databases with BLAST. Basic Local Alignment Search Tool How BLAST works Interpreting search results The NCBI Web BLAST interface Demonstration.
DNA alphabet DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different.
Part I: Identifying sequences with … Speaker : S. Gaj Date
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
1 P6a Extra Discussion Slides Part 1. 2 Section A.
BLAST Basic Local Alignment Search Tool (Altschul et al. 1990)
WSSP Chapter 8 BLASTX Translated DNA vs Protein searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag.
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Sequence Search and Analysis SPE 1653 (703)
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Database search. Overview : 1. FastA : is suitable for protein sequence searching 2. BLAST : is suitable for DNA, RNA, protein sequence searching.
Chapter 12 Section 4 Mutations. Mutations DNA contains the code of instructions for cells. Sometimes, an error occurs when the code is copied. - Such.
Plant Biology Division Post-process of IMGAG M.t. 2.0 Release Affymetrix Medicago Probe set – IMGAG 2.0 / MTGI 8.0 Mapping Zhao Bioinformatics Lab.
Finding Sequence Similarities >query AGACGAACCTAGCACAAGCGCGTCTGGAAAGACCCGCCAGCTACGGTCACCGAG CTTCTCATTGCTCTTCCTAACAGTGTGATAGGCTAACCGTAATGGCGTTCAGGA GTATTTGGACTGCAATATTGGCCCTCGTTCAAGGGCGCCTACCATCACCCGACG.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Database Similarity Search. 2 Sequences that are similar probably have the same function Why do we care to align sequences?
12-4 MUTATIONS. I. KINDS OF MUTATIONS 1. Mutation- change in genetic material that can result from incorrect DNA replication 2. Point Mutations- gene.
Construction of Substitution matrices
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Finding Sequence Similarities >query AGACGAACCTAGCACAAGCGCGTCTGGAAAGACCCGCCAGCTACGGTCACCGAG CTTCTCATTGCTCTTCCTAACAGTGTGATAGGCTAACCGTAATGGCGTTCAGGA GTATTTGGACTGCAATATTGGCCCTCGTTCAAGGGCGCCTACCATCACCCGACG.
What is BLAST? Basic BLAST search What is BLAST?
Heuristic Alignment Algorithms Hongchao Li Jan
Practice -- BLAST search in your own computer 1.Download data file from the course web page, or Ensemble. Save in the blast\dbs folder. 2.Start a CMD window,
Welcome to the combined BLAST and Genome Browser Tutorial.
Summer Bioinformatics Workshop 2008 BLAST Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State University – Rochester Center
Gene_identifier color_no gtm1_mouse 2 gtm2_mouse 2 >fasta_format_description_line >GTM1_HUMAN GLUTATHIONE S-TRANSFERASE MU 1 (GSTM1-1) PMILGYWDIRGLAHAIRLLLEYTDSSYEEKKYTMGDAPDYDRSQWLNEKFKLGLDFPNLPYLIDGAHKI.
What is sequencing? Video: WlxM (Illumina video) WlxM.
Name of presentation Month 2009 SPARQ-ed PROJECT Mutations in the tumor suppressor gene p53 Pulari Thangavelu (PhD student) April Chromosome Instability.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
What is BLAST? Basic BLAST search What is BLAST?
Basics of BLAST Basic BLAST Search - What is BLAST?
Determine ORF and BLASTP
EVOLUTIONARY RELATIONSHIP BETWEEN ORGANISMS
Welcome to Introduction to Bioinformatics
Gene architecture and sequence annotation
Bioinformatics and BLAST
Gene Annotation with DNA Subway
Pairwise Sequence Alignment
Basic Local Alignment Search Tool
Practice Clone 3 Download and get ready!.
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Additional file 3 >HWI-EAS344:7:70:153:1969#0/1 Length = 75 
Presentation transcript:

WSSP Chapter 7 BLASTN: DNA vs DNA searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc atgattgctt caatattttc acttcaatga ttggttctaa gcattcgaat gcgtacccgt ttgattaata tttccatttc tgtcccagtt tttaattttc atttcttttg gttaaaaaat tcccagtctc ttgaatgctt ttctaaaatc tttaattcaa ttatttatta gaatcttctg ttttgagaac tttgtaatgt aattaaataa tttgatgaaa tgattatgaa tgcgaataaa ttattaattt accgtgctga ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag gatcccgatc atgattgctt caatattttc acttcaatga ttggttctaa gcattcgaat gcgtacccgt ttgattaata tttccatttc tgtcccagtt tttaattttc atttcttttg gttaaaaaat tcccagtctc ttgaatgctt ttctaaaatc tttaattcaa ttatttatta gaatcttctg ttttgagaac tttgtaatgt aattaaataa tttgatgaaa tgattatgaa tgcgaataaa ttattaattt accgtgttgg attgaaggta attatcttgc atgagccagc tgatgagtat gatacagttt

© 2014 WSSP

DSAP: BLASTn Page p. 7-1 © 2014 WSSP

p. 7-1 NCBI BLAST Home Page © 2014 WSSP

p. 7-2 NCBI BLASTN search page © 2013 WSSP

p. 7-2 Copy sequence from DSAP or wave form program © 2014 WSSP

p. 7-3 Choose a database (nr/nt or est) © 2014 WSSP

p. 7-4 Search options (Use defaults) © 2014 WSSP

p. 7-5 BLASTN progress report (search may take a few minutes) © 2014 WSSP

p. 7-5 Format options (use defaults) © 2014 WSSP

p. 7-6 EX1.14 BLASTN nr/nt database © 2014 WSSP

Graphic report of EX2.09 p. 7-7 © 2014 WSSP

p. 7-7 BLASTN list of matches for EX1.14 © 2014 WSSP

EX2.09 BLASTN p. 7-9 © 2014 WSSP

Clicker Question: Which match is the most meaningful? A) B) C) D) E) None © 2014 WSSP

Clicker Question: Which part of the gene appears to be the most conserved? A) Bp B) Bp C) Bp D) All E) None © 2014 WSSP

Clicker Question: The entire insert of a clone was sequenced and a BLASTN search was performed. Are these matches likely to be significant? A) Yes B) No C) Can not tell from data © 2014 WSSP

Question: Which of the following E values indicates the best match? A)1e-10 B)5e-91 C)5.3 D)0.0 E)Can not tell from this data © 2014 WSSP

Best match to EX1.14 p. 7-9 Our Seq. Database Seq. Length of sequence Mismatch Match © 2014 WSSP

Perfect, but short, matches are not usually meaningful >gi| |emb|AL |CNS07EFY Human chromosome 14 DNA sequence BAC R-736L22 of library RPCI-11 from chromosome 14 of Homo sapiens (Human), complete sequence Score = 40.1 bits (20), Expect = 4.6 Identities = 20/20 (100%) Query: 189 ttttctgaatattcataata 208 |||||||||||||||||||| Sbjct: ttttctgaatattcataata © 2014 WSSP

Examine the best alignments: Are they significant? 7-9 © 2014 WSSP

Mismatches i)Bad sequence on our part ii)Bad sequence on their part iii)Differences in the sequence of the two organisms C R E L L I L D A Query TGT CGT GAA CTC CTA ATT CTC GAC GCC ||| ||| ||| || || || || || || Sbjct TGT CGT GAA CTT CTG ATC CTT GAT GCA C R E L L I L D A Query: 383 AGCGTTGCCGTTCGTCAGCTTGATGTTAAGCTGGGCAGCGCGCTCGACGATTCCTTTGCG 324 |||||| |||||||||||||||||||| | ||| || ||||||||||||||||| ||||| Sbjct: 6152 AGCGTTTCCGTTCGTCAGCTTGATGTTCAACTGAGCGGCGCGCTCGACGATTCCCTTGCG 6211 Wobble position: same amino acid, but different codon….degenerate code © 2014 WSSP

C R R T P D P * Query TGTCGT-CGAACTCCTGATCCTTGA |||||| |||||||||||||||||| Sbjct TGTCGTCCGAACTCCTGATCCTTGA C R E L L I L D p Small Gaps- alter the reading frame of the protein © 2014 WSSP

Query: 179 TTCGAGCTACCAGATGATC-GATTGGAACAT-T-C--TGTCATTG-AC-CTTC-AGGTAA 230 ||||||| || | | || |||| || || | | | | ||| | |||| |||| | Sbjct: 4684 TTCGAGCG-CC-GTTAATATGATTACAATATCTACAATATTATTATATGCTTCCAGGTGA 4741 Query: 231 TCAACCATGACCGTGTCAACCGAAACGACGTTATCGGCCGTGCACTATTGAACATGGAGG 290 |||| ||||||||||| ||||| || || || || |||||||| || | || ||||| | Sbjct: 4742 TCAATCATGACCGTGTTAACCGTAATGATGTAATTGGCCGTGCCCTTCTTAATATGGAAG 4801 An example of a match with and without gaps. p © 2014 WSSP

>gi| |dbj|AK | Triticum aestivum cDNA, clone: SET5_E05, cultivar: Chinese Spring Length=650gi| |dbj|AK | Score = 219 bits (242), Expect = 2e-53 Identities = 211/271 (77%), Gaps = 0/271 (0%) Query 10 GATGTTGGAAGGGAGGGCGAGAGTAGAAGACACCGACATGCCGAGGAAGATGCAGGCGGA 69 |||| ||||||||| ||||| || || ||||||||||||||| ||||||||| | | Sbjct 78 GATGCTGGAAGGGAAGGCGACGGTGGAGGACACCGACATGCCGGCCAAGATGCAGCTGCA 137 Query 70 GGCCATGAACGCCGCCTCTCACGCGCTCGATCTGTTCGACGTCGCGGACTGCAAGAGCCT 129 ||||| || || || |||||||| | ||||||||| |||||| |||| | Sbjct 138 GGCCACCTCGGCGGCGTCCAGGGCGCTCGAACGCTTCGACGTCCTCGACTGCCGGAGCAT 197 Query 130 CGCCGCGCATATCAAGAAGGAATTTGATAAGATCTACGGTCCGGGATGGCAGTGCGTCGT 189 ||| ||||| ||||||||||| || || | |||| |||| ||||| ||||||||||| || Sbjct 198 CGCGGCGCACATCAAGAAGGAGTTCGACACGATCCACGGCCCGGGGTGGCAGTGCGTGGT 257 Query 190 CGGCTCCAGCTTCGGCTGTTTCTTCACTCACAAGAAAGGCAGCTTCATCTACTTCCGCCT 249 |||| |||||||||||| | |||||| |||| || || |||||||| |||||| || Sbjct 258 GGGCTGCAGCTTCGGCTGCTACTTCACGCACAGCAAGGGGAGCTTCATATACTTCAAGCT 317 Query 250 GGAGACGCTCCACTTCCTCATCTTCAAAGGC 280 ||| |||||| |||||| ||||||||||| Sbjct 318 CGAGTCGCTCCGGTTCCTCGTCTTCAAAGGC 348 Alignment of the third best match to EX1.14 p © 2014 WSSP

p Alignments near the end of the EX1.13 >gi| |ref|NG_ | Homo sapiens glypican 4 (GPC4), RefSeqGene on chromosome X Length= Score = 71.6 bits (78), Expect = 6e-09 Identities = 42/44 (95%), Gaps = 0/44 (0%)gi| |ref|NG_ | Query 665 CTAGCTTTTCTTAACaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 708 || ||||||||||| ||||||||||||||||||||||||||||| Sbjct CTTGCTTTTCTTAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA © 2014 WSSP

Question: Is this match biologically significant? A)Yes B)No C)Can not tell from data © 2014 WSSP

A)Yes B)No C)Can not tell from data Question: Is this match biologically significant? © 2014 WSSP

Clicker Question: Is this match likely in a protein coding region? A)Yes B)No C)Can not tell from data © 2014 WSSP

Clicker Question: What is the likely explanation for the gap? A)Sequence error in cDNA B)Error in making the cDNA C)Start of an intron region D)Can not tell from data E)A, B or C © 2014 WSSP

Clicker Question: Is this match likely in a protein coding region? A)Yes B)No C)Can not tell from data © 2014 WSSP

p Fill in the table listing the best matches from three different organisms. List Landoltia if there is a match © 2014 WSSP

Use the clone report to obtain more information about the gene p © 2014 WSSP

Is this a signific ant match? a)Yes b)No p © 2014 WSSP

3) Perform a BLASTn of the est database Change the database p © 2014 WSSP

p BLASTn report of the EX1.14 search of the est database © 2014 WSSP

>gi| |gb|GD | CCHY28888.g1 CCHY Panicum virgatum callus (N) Panicum virgatumgi| |gb|GD | cDNA clone CCHY ', mRNA sequence. Length=624 Score = 246 bits (272), Expect = 1e-61 Identities = 226/286 (79%), Gaps = 0/286 (0%) Strand=Plus/Minus Query 3 GAGAGAAGATGTTGGAAGGGAGGGCGAGAGTAGAAGACACCGACATGCCGAGGAAGATGC 62 |||| | ||| ||||||||| ||||| || || ||||| ||||||||| |||||||| Sbjct 527 GAGACACCATGCTGGAAGGGAAGGCGATGGTGGAGGACACGGACATGCCGGCGAAGATGC 468 Query 63 AGGCGGAGGCCATGAACGCCGCCTCTCACGCGCTCGATCTGTTCGACGTCGCGGACTGCA 122 ||||| |||| ||| || || || || ||||| | ||||||||| |||||| Sbjct 467 AGGCGCAGGCGATGGCGGCGGCGTCCAGGGCCCTCGACCGCTTCGACGTCCTCGACTGCC 408 Query 123 AGAGCCTCGCCGCGCATATCAAGAAGGAATTTGATAAGATCTACGGTCCGGGATGGCAGT 182 |||| |||| ||||| ||||||||||| ||||| | |||| |||| || || ||||| | Sbjct 407 GGAGCATCGCGGCGCACATCAAGAAGGAGTTTGACACGATCCACGGCCCCGGGTGGCAAT 348 Query 183 GCGTCGTCGGCTCCAGCTTCGGCTGTTTCTTCACTCACAAGAAAGGCAGCTTCATCTACT 242 |||| || ||||||||||||||||| | |||||| |||| || || ||||||||||||| Sbjct 347 GCGTGGTGGGCTCCAGCTTCGGCTGCTACTTCACGCACAGCAAGGGGAGCTTCATCTACT 288 Query 243 TCCGCCTGGAGACGCTCCACTTCCTCATCTTCAAAGGCGCGGCCGC 288 |||| || ||| ||||| ||||||||||||||||| ||||| || Sbjct 287 TCCGGCTCGAGTCGCTCAGGTTCCTCATCTTCAAAGGGGCGGCAGC 242 Alignment of the best match to EX1.13 from the est search p © 2014 WSSP

Fill out the DSAP table of the BLASTn search of the est database p © 2014 WSSP

Query 61 CAAGGTCTAAGTACTGAAAAGGAAAGTCTACTAATTACAAAGAAGTTATTGTTTGTACCT 120 |||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||| Sbjct CAAGGTCTAAGTACTGAAAAGGAAAGTCCACTAATTACAAAGAAGTTATTGTTTGTACCT Query 121 TTTGTATCAGGGTTTATTAAATTTCAATCTTTATTGCTGAATCCCGAAACAAGGTGATCT 180 |||||||||||||||||||||||| |||||| |||||||||||||||||||||||||||| Sbjct TTTGTATCAGGGTTTATTAAATTTTAATCTTCATTGCTGAATCCCGAAACAAGGTGATCT Open Question: Why are there differences in the sequences? © 2014 WSSP

Q5. BLASTn Analysis: Is your cDNA similar to genes in other organisms? p © 2014 WSSP

Q6. BLASTn Analysis: Is your cDNA similar to genes in different kingdoms? p © 2014 WSSP i.e. are there any matches to organisms from the eubacteria, archabacteria, protist, fungi, or animal kingdoms or are they all matches to other plants?

! Is the sequence found in many other organisms? © 2014 WSSP