Presentation is loading. Please wait.

Presentation is loading. Please wait.

SSAHA, or Sequence Search and Alignment by Hashing Algorithm, is used mainly for fast sequence assembly, SNP detection, and the ordering and orientation.

Similar presentations


Presentation on theme: "SSAHA, or Sequence Search and Alignment by Hashing Algorithm, is used mainly for fast sequence assembly, SNP detection, and the ordering and orientation."— Presentation transcript:

1 SSAHA, or Sequence Search and Alignment by Hashing Algorithm, is used mainly for fast sequence assembly, SNP detection, and the ordering and orientation of contigs. It bears great significance in the realm of Computer Science and Bioinformatics by allowing scientists to search through hash tables and make out possible matches of DNA. Hash tables perform as lookup tables that find corresponding values through a hash function. It makes use of its fast sequencing by keeping the hash database in memory and using a fast indexing method to locate queries. It is most effective in finding matches between closely-related DNA strands. Thus, SSAHA has great potential in the bioinformatics field.  Takes data and stores it into a hashing function  When a query is present, the function will hash the query and data into k-tuples  After hashing and comparing values, a table is constructed (table 1.) for all given sequences in the database  Finally, that hashed data is arranged in another table (table 2.) so that it can determine where matches are made via complicated algorithms  The chart (table 2.) is finally made with H as the hits and M indicates possible matches to the query  SSAHA returns where the query is found and where it starts  SSAHA makes two tables. The first table finds position of k-tuple. Then, another program/algorithm makes another table, which searches for matches and compiles the data. 1.http://www.sanger.ac.uk/Software/analysis/SSAHA/ 2.http://www.genome.org/cgi/content/full/11/10/1725 3.http://www1.ncifcrf.gov/app/htdocs/appdb/index.php?info=s saha 4.www.duke.edu/~ia14 Fei Lian Fei.lian@duke.edu Iyanna Atwell Iyanna.atwell@duke.edu SSAHA: Sequence Search and Alignment by Hashing Algorithm Fei Lian, Iyanna Atwell, Owen Astrachan Computer Science Department, Duke University, Durham, North Carolina 1.SSAHA is mainly used for single-nucleotide polymorphism detection and large-scale sequence assembly. 2.The SSAHA algorithm works mainly by the concatenation of exact matches (of base pairs) between k-tuples. 3.By hashing the database, search time becomes independent of database size. 4.The time it takes of SSAHA to hash is never more than twice the time it takes to process all of the database data using a BLAST (Basic Local Alignment Search Tool) search. 5.By using a hash algorithm, relative memory can be allocated to each list of positions L. 6.SSAHA takes advantage of multiple repetitions in human DNA to allocate unrecognizable base pairs into one single k-tuple containing all A’s (default). 1.William R. Pearson and David J. Lipman, Improved Tools for Biological Sequence Comparison, in: PNAS | April 15, 1988 | vol. 85 | no. 8 | 2444-2448 2.SF Altschul, TL Madden, AA Schaffer, J Zhang, Z Zhang, W Miller and DJ Lipman, Gapped BLAST and PSI-BLAST: a new generation of protein database search programs, from: http://nar.oxfordjournals.org/cgi/content/abstract/25/17/3389?ijke y=ba0bbba2e301195849a11dc719e221c69af7f2ee&keytype2=tf_ips ecsha 3.Arthur L. Delcher, et. al. Fast Algorithms for large-scale genome alignment and comparison. In: Nucleic Acids Research, 2002, Vol. 30, No. 1. Pgs: 2478-2483 4.Stephen F. Altschul, et. al. Basic Local Alignment Search Tool, in: Journal of Moecular Biology, (1990) 215, 403-410. Acknowledgements go out to the creators of SSAHA, to classmates who helped with the editing of this poster, and to “ola,” who’s inspiration is unforgettable. Table 2. List of Matches for the Query Sequence Twt(Q)PositionsHM 0TG(1, 13)(1, 13, 13) (1, 5, 9) (2, 7)(2, 7, 7) (1, 13, 13) (3, 9)(3, 9, 9)(2, 2, 3) 1GC (2, 1, 3) 2CA(2, 3)(2, 1, 3)(2, 1, 5) (2, 9) (2, 7, 9)(2, 4, 9) (2, 21)(2, 19, 21) (2, 7, 7) (2, 27)(2, 25, 27) (2, 7, 9) (2, 33)(2, 31, 33)(2, 7, 11) (3, 21)(3, 19, 21)(2, 7, 13) (3, 23)(3, 21, 23) (2, 16, 19) 3AA(2, 19)(2, 16, 19)(2, 16, 21) 4AC(1, 9)(1, 5, 9) (2, 19, 21) (2, 5)(2, 1, 5) (2, 22, 27) (2, 11)(2, 7, 11)(2, 25, 27) 5CA(2, 3)(2, 2, 3) (2, 28, 33) (2, 9)(2, 4, 9) (2, 31, 33) (2, 21)(2, 16, 21)(3, 3, 3) (2, 27)(2, 22, 27)(3, 9, 9) (2, 33)(2, 28, 33) (3, 16, 21) (3, 21)(3, 16, 21) (3, 18, 23) (3, 23) (3, 18, 23)(3, 19, 21) 6AT(2, 13)(2, 7, 13)(3, 21, 23) (3, 3)(3, -3, 3)  The table above represents the hash table compiled from the database and is searched through to find the occurrence of the query sequence.  Positions are what strand the query is found in and its index.  H is the number of hits found between the database and query sequence. It is given by the equation : H1 = i1, j1, − t, j1, H2 = i2, j2 − t, j2,..., Hr = ir, jr − t, jr. M is the master list of hits arrange according to the three values of H and their sequential order.  It is referred to as the index, shift, and offset. As for the bold figures, it details the exact match of the query sequence Q=TGCAACAT and the database (three sequences).  From the information provided, the query sequence is found in S2 and begins at the seventh index of S2 and continues for eight bases.  This illustrates how SSAHA works with 2-tuples to store a hashed sequence, then search through it to return where the query is located. Table 1. A 2-tuple Hash Table for S1, S2, and S3 WE(w)Positions AA0(2, 19) AC1(1, 9)(2, 5)(2, 11) AG2(1, 15)(2, 35) AT3(2, 13)(3, 3) CA4(2, 3)(2, 9)(2, 21)(2, 27)(2, 33)(3, 21)(3, 23) CC5(1, 21)(2, 31)(3, 5)(3, 7) CG6(1, 5) CT7(1, 23)(2, 39)(2, 43)(3, 13)(3, 15)(3, 17) GA8(1, 3)(1, 17)(2, 15)(2, 25) GC9 GG10(1, 25)(1, 31)(2, 17)(2, 29)(3, 1) GT11(1, 1)(1, 27)(1, 29)(2, 1)(2, 37)(3, 19) TA12(3, 25) TC13(1, 7)(1, 11)(1, 19)(2, 23)(2, 41)(3, 11) TG14(1, 13)(2, 7)(3, 9) TT15 A 2-tuple table for three sequences: S1 = GTGACGTCACTCTGAGGATCCCCTGGGTGTGG S2 = GTCAACTGCAACATGAGGAACATCGACAGGCCCAAGGTC TTCCT S3 = GGATCCCCTGTCCTCTCTGTCACATA. Introduction How SSAHA Works SSAHA Main Points Related Articles Contact Information Acknowledgements References


Download ppt "SSAHA, or Sequence Search and Alignment by Hashing Algorithm, is used mainly for fast sequence assembly, SNP detection, and the ordering and orientation."

Similar presentations


Ads by Google