Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 DNA Classifications with Self-Organizing Maps (SOMs) Thanakorn Naenna Mark J. Embrechts Robert A. Bress May 2003 IEEE International Workshop on Soft.

Similar presentations


Presentation on theme: "1 DNA Classifications with Self-Organizing Maps (SOMs) Thanakorn Naenna Mark J. Embrechts Robert A. Bress May 2003 IEEE International Workshop on Soft."— Presentation transcript:

1 1 DNA Classifications with Self-Organizing Maps (SOMs) Thanakorn Naenna Mark J. Embrechts Robert A. Bress May 2003 IEEE International Workshop on Soft Computing in Industrial Application

2 2 Presentation Outline Introduction to DNA Splice Junctions Data Collection Introduction to SOMs SOM for DNA Splice Junction Classification Results Conclusions

3 3

4 4 Human genome in a nutshell Human : 23 chromosomes Chromosomes  thousands of genes Gene  info : exons, comments : introns Splice junction are like /* comment flags */ in C-code Exons and introns  codons Codon  bases

5 5 DNA Splice Junctions DNA  billions of nucleotides ( A, C, G, T) Genes  sequences of amino acids (exons) that are often interrupted by non-coding nucleotides (introns) <.1% of human DNA is made up of exons 99% of splice junctions have the same motif, for – Exon to intron it is GT – Intron to exon it is AG ….GTGAAGGTTAA AGATGTAGAT GT ATTG… Splice Junction Exon Intron

6 6 Data Collection: HTML Browser + Perl scripts BioBrowser Download HTML ExtractLinks() Download HTML - data ExtractData() TranslateData()

7 7

8 8 DNA Splice Junction (Cont.) A complete gene is made up of different exons Splice junction identification aids in the discovery of new genes The dataset used for this study is made up of 1,424 sequences Data were created ab initio from GENBANK Each sequence is 32 nucleotides long with regions comprising -15 to +15 nucleotides from the splice-junction …T GTAAG G AG ACGA GTT … Intron Splice Junction Exon

9 9 Self-Organizing Maps (SOM) Network Unsupervised learning neural network Projects high-dimensional input data onto two- dimensional output map Preserves the topology of the input data Visualizes structures and clusters of the data

10 10 Use of SOM for DNA Splice Junction Classification Model SOM SOM Classification Map Classification Class A: intron to exon Class B: exon to intron Class C: no transition Classification Class A: intron to exon Class B: exon to intron Class C: no transition DNA training set DNA test set Neuron identification methods - Highest frequency class - Closest neuron Neuron identification methods - Highest frequency class - Closest neuron A B C U-Matrix Map

11 11 The U-matrix of the DNA Training Set

12 12 SOM Results for DNA Splice Junction Data A B C Confusion matrix of 424-DNA test set The U-matrix of the DNA training set

13 13 Conclusions SOM is effective in DNA splice junction classification SOM is powerful visualization for high dimensional data

14 14 Demo with Analyze Code 800 training data, 324 test data (160 features) 96% correct overall classification on test data Confusion Matrix

15 GATCAATGAGGTGGACACCAGAGGCGGGGACTTGTAAATAACACTGGGCTGTAGGAGTGA TGGGGTTCACCTCTAATTCTAAGATGGCTAGATAATGCATCTTTCAGGGTTGTGCTTCTA TCTAGAAGGTAGAGCTGTGGTCGTTCAATAAAAGTCCTCAAGAGGTTGGTTAATACGCAT GTTTAATAGTACAGTATGGTGACTATAGTCAACAATAATTTATTGTACATTTTTAAATAG CTAGAAGAAAAGCATTGGGAAGTTTCCAACATGAAGAAAAGATAAATGGTCAAGGGAATG GATATCCTAATTACCCTGATTTGATCATTATGCATTATATACATGAATCAAAATATCACA CATACCTTCAAACTATGTACAAATATTATATACCAATAAAAAATCATCATCATCATCTCC ATCATCACCACCCTCCTCCTCATCACCACCAGCATCACCACCATCATCACCACCACCATC ATCACCACCACCACTGCCATCATCATCACCACCACTGTGCCATCATCATCACCACCACTG TCATTATCACCACCACCATCATCACCAACACCACTGCCATCGTCATCACCACCACTGTCA TTATCACCACCACCATCACCAACATCACCACCACCATTATCACCACCATCAACACCACCA CCCCCATCATCATCATCACTACTACCATCATTACCAGCACCACCACCACTATCACCACCA CCACCACAATCACCATCACCACTATCATCAACATCATCACTACCACCATCACCAACACCA CCATCATTATCACCACCACCACCATCACCAACATCACCACCATCATCATCACCACCATCA CCAAGACCATCATCATCACCATCACCACCAACATCACCACCATCACCAACACCACCATCA CCACCACCACCACCATCATCACCACCACCACCATCATCATCACCACCACCGCCATCATCA TCGCCACCACCATGACCACCACCATCACAACCATCACCACCATCACAACCACCATCATCA CTATCGCTATCACCACCATCACCATTACCACCACCATTACTACAACCATGACCATCACCA CCATCACCACCACCATCACAACGATCACCATCACAGCCACCATCATCACCACCACCACCA CCACCATCACCATCAAACCATCGGCATTATTATTTTTTTAGAATTTTGTTGGGATTCAGT ATCTGCCAAGATACCCATTCTTAAAACATGAAAAAGCAGCTGACCCTCCTGTGGCCCCCT TTTTGGGCAGTCATTGCAGGACCTCATCCCCAAGCAGCAGCTCTGGTGGCATACAGGCAA CCCACCACCAAGGTAGAGGGTAATTGAGCAGAAAAGCCACTTCCTCCAGCAGTTCCCTGT GATCAATGAGGTGGACACCAGAGGCGGGGACTTGTAAATAACACTGGGCTGTAGGAGTGA TGGGGTTCACCTCTAATTCTAAGATGGCTAGATAATGCATCTTTCAGGGTTGTGCTTCTA TCTAGAAGGTAGAGCTGTGGTCGTTCAATAAAAGTCCTCAAGAGGTTGGTTAATACGCAT GTTTAATAGTACAGTATGGTGACTATAGTCAACAATAATTTATTGTACATTTTTAAATAG CTAGAAGAAAAGCATTGGGAAGTTTCCAACATGAAGAAAAGATAAATGGTCAAGGGAATG GATATCCTAATTACCCTGATTTGATCATTATGCATTATATACATGAATCAAAATATCACA CATACCTTCAAACTATGTACAAATATTATATACCAATAAAAAATCATCATCATCATCTCC ATCATCACCACCCTCCTCCTCATCACCACCAGCATCACCACCATCATCACCACCACCATC ATCACCACCACCACTGCCATCATCATCACCACCACTGTGCCATCATCATCACCACCACTG TCATTATCACCACCACCATCATCACCAACACCACTGCCATCGTCATCACCACCACTGTCA TTATCACCACCACCATCACCAACATCACCACCACCATTATCACCACCATCAACACCACCA CCCCCATCATCATCATCACTACTACCATCATTACCAGCACCACCACCACTATCACCACCA CCACCACAATCACCATCACCACTATCATCAACATCATCACTACCACCATCACCAACACCA CCATCATTATCACCACCACCACCATCACCAACATCACCACCATCATCATCACCACCATCA CCAAGACCATCATCATCACCATCACCACCAACATCACCACCATCACCAACACCACCATCA CCACCACCACCACCATCATCACCACCACCACCATCATCATCACCACCACCGCCATCATCA TCGCCACCACCATGACCACCACCATCACAACCATCACCACCATCACAACCACCATCATCA CTATCGCTATCACCACCATCACCATTACCACCACCATTACTACAACCATGACCATCACCA CCATCACCACCACCATCACAACGATCACCATCACAGCCACCATCATCACCACCACCACCA CCACCATCACCATCAAACCATCGGCATTATTATTTTTTTAGAATTTTGTTGGGATTCAGT ATCTGCCAAGATACCCATTCTTAAAACATGAAAAAGCAGCTGACCCTCCTGTGGCCCCCT TTTTGGGCAGTCATTGCAGGACCTCATCCCCAAGCAGCAGCTCTGGTGGCATACAGGCAA CCCACCACCAAGGTAGAGGGTAATTGAGCAGAAAAGCCACTTCCTCCAGCAGTTCCCTGT THE END

16 16

17 17

18 18

19 19

20 20

21 21

22 22

23 23

24 24


Download ppt "1 DNA Classifications with Self-Organizing Maps (SOMs) Thanakorn Naenna Mark J. Embrechts Robert A. Bress May 2003 IEEE International Workshop on Soft."

Similar presentations


Ads by Google