Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mida kasutame sarnaste järjestuste leidmiseks:

Similar presentations


Presentation on theme: "Mida kasutame sarnaste järjestuste leidmiseks:"— Presentation transcript:

1 Mida kasutame sarnaste järjestuste leidmiseks:
BLAST BLAST (PSI-BLAST) on optimiseeritud selleks et leida homolooge – ühise evolutsioonilise päritoluga järjestusi.

2

3 Sarnasuse mõõtmine valkudes Log-odd sarnasuste tabel
A R N D C Q E G H I L K M F P S T W Y V B Z X * A R N D C Q E G H I L K M F P S T W Y V B Z X *

4 Sarnasuse mõõtmine nukleiinhapetes
Identity matrix: A C G T A C G T Transition/transversion: A C G T A C G T BLAST DNA matrix: A C G T A C G T

5 2. Vajadused on muutunud - Genoomi assambleerimine - Järjestuste lokaliseerimine genoomis - mRNA võrdlemine genoomi järjestusega Need probleemid ei vaja nii keerukat sarnasusmaatriksit ega afiinset gap-penalty mudelit

6 3. SNP-de lokaliseerimine
SNPd <-> inimese genoom 50-500bp <-> bp SNP1 chr location SNP2 chr location SNP3 chr location SNP4 chr location SNP5 chr location SNP6 chr location SNP7 chr location SNP8 chr location SNP9 chr location SNP10 chr location ...

7 4. Milliseid tööriistu kasutada
BLAST – väga aeglane MEGABLAST - aeglane SSAHA - ? BLAT - ? UNIMARKER (UM) - ?

8 5. SSAHA Sequence Search and Alignment by Hashing Algorithm
Koostab tabeli (indeksi) kõigist genoomis olevatest “sõnadest” ja jätab meelde nende asukoha genoomis Tüüpiline sõna pikkus 10 nt. step 1 step 2 step 10

9 Bioinformaatika rakendusi SSAHA
TTTTTTAAAAGAGAAAAAATTCTGACGGGGGCATAACTGGAGAATAAAGTGATAAAATACTGCTGAAACAAAAAGTCATCTG Otsing: GGGGGCATAACTGGAGAAGGAGAA , , , sõnade koguarv 3*109 , 4 bytes each = 12 GB

10 Bioinformaatika rakendusi UM
TTTTTTAAAAGAGAAAAAATTCTGACGGGGGCATAACTGGAGAATAAAGTGATAAAATACTGCTGAAACAAAAAGTCATCTG Indeksi koostamine: sõna pikkus 15, kombinatsioonide arv 415 = 109 = 4 bytes unikaalsete variantide arv = 15-mer ID location x (4 + 4) bytes = = 1.3 GB

11

12

13

14 Using this binary representation,we can process
the DNA sequence using some of the bit operations used in computer science.For example,a left-shift operation adding 1 or 0,depending on the new nucleotide read in,will give the next N-mer DNA (Fig.5A).Other operations to facilitate rapid searches of,say,complementary sequences should also be possible,although this was not explored in the present study. The N-mers were then placed in a binary tree (Fig.5B)as tree nodes,along with their chromosome ID, contig ID, sequence position on the contig,and their occurrence count and links to left child and right child,respectively,for subse- quently encountered N-mers with the same row value but a larger or smaller column value (Fig.5B).By traversing every tree node after the genome scanning was completed,all UMs of length N along with their genomic location were identified; they are the nodes with the occurrence count equal to one.

15

16 Bioinformaatika rakendusi GenomeTester
TTTTTTAAAAGAGAAAAAATTCTGACGGGGGCATAACTGGAGAATAAAGTGATAAAATACTGCTGAAACAAAAAGTCATCTG Indeksi koostamine: sõna pikkus 16 kombinatsioonide arv 416 = 4*109 sõnade koguarv 3*109 , 4+4 bytes each = 24 GB 16-mer ID location

17 Figure 2. GenomeTester is signficantly faster for the ‘genome test’ than any other program.
The ‘genome test’ here means finding locations of all primers (16 nt. from the 3’ end) in the human genome and calculation of possible PCR products. Tests were performed on PC-Linux based server, Pentium III, 2 GB RAM, SCSI-RAID0 hard drives. BLAST and MEGABLAST were used without dust filter, word length was 12 for MEGABLAST and 10 for SSAHA.


Download ppt "Mida kasutame sarnaste järjestuste leidmiseks:"

Similar presentations


Ads by Google