Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University.

Similar presentations


Presentation on theme: "Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University."— Presentation transcript:

1 Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University of Denmark brunak@cbs.dtu.dk

2 Parvis alignment >carp Cyprinus carpio growth hormone 210 aa vs. >chicken Gallus gallus growth hormone 216 aa scoring matrix: BLOSUM50, gap penalties: -12/-2 40.6% identity; Global alignment score: 487 10 20 30 40 50 60 70 carp MA--RVLVLLSVVLVSLLVNQGRASDN-----QRLFNNAVIRVQHLHQLAAKMINDFEDSLLPEERRQLSKIFPLSFCNSD ::. :...:.:. : :.. :: :::.:.:::: :::...::..::..:.:.:: :. chicken MAPGSWFSPLLIAVVTLGLPQEAAATFPAMPLSNLFANAVLRAQHLHLLAAETYKEFERTYIPEDQRYTNKNSQAAFCYSE 10 20 30 40 50 60 70 80 80 90 100 110 120 130 140 150 carp YIEAPAGKDETQKSSMLKLLRISFHLIESWEFPSQSLSGTVSNSLTVGNPNQLTEKLADLKMGISVLIQACLDGQPNMDDN : ::.:::..:..:..:::.:. ::.:: : : ::..:.:. :.... ::: ::. ::..:.. :.:. chicken TIPAPTGKDDAQQKSDMELLRFSLVLIQSWLTPVQYLSKVFTNNLVFGTSDRVFEKLKDLEEGIQALMRELEDRSPR---G 90 100 110 120 130 140 150 160 170 180 190 200 210 carp DSLPLP-FEDFYLTM-GENNLRESFRLLACFKKDMHKVETYLRVANCRRSLDSNCTL.: :.. :...:. :... ::.:::::.:::::::.:.:::.::::. chicken PQLLRPTYDKFDIHLRNEDALLKNYGLLSCFKKDLHKVETYLKVMKCRRFGESNCTI 170 180 190 200 210

3

4

5

6

7

8

9

10 Biological neuron

11

12 Diversity of interactions in a network enables complex calculations Similar in biological and artificial systems Excitatory (+) and inhibitory (-) relations between compute units

13

14 Transfer of biological principles to neural network algorithms Non-linear relation between input and output Massively parallel information processing Data-driven construction of algorithms Ability to generalize to new data items

15

16

17 Simplest non-trivial classification problem CNHSYYP, HIETRRA, NWQSADY, NQYSEPR, WHITRCA, DYHSANY,... Two categories: positives and negatives Data described by two features, e.g. charge, sidechain volume, molecular weight, number of atoms,...

18 Features of phosphorylations sites PKG cGMP- dep.kinase PKC CaM-II Ca++/cal- modulin-dep. kinase cdc2 Cyclin- dep.kinase 2 CK-II Casein kinase 2

19

20

21 Homotypical cerebral cortex – (from primate) - 6 layers

22

23

24

25

26 DEMO

27

28 negative positive Training and error reduction

29 Transfer of biological principles to neural network algorithms Non-linear relation between input and output Massively parallel information processing Data-driven construction of algorithms

30 Sparse encoding of amino acid sequence windows

31 Sparse encoding of nucleotide sequence windows Nucleotides 4 letter alphabet Normally no need for a fifth letter ACGTAGGCAATCTCAGACGTTTATC 1000010000100001100000100010010010001000000101000001010010000010100001000010000100010001100000010100

32 NetTalk Network learned to pronounce English text (mapped text to phonemes) Network input: moving window of 7 characters Network output: phoneme code for center character in input window Output fed to a phoneme-to-speech converter Each input character represented by a group of 29 units (localist representation) 203 total input units 80 hidden units 26 output units for phonemes Trained on 1024 words using a side-by-side English/phoneme source Intelligible speech after 10 training epochs; 95% accuracy on training corpus after 50 epochs Some hidden units developed meaningful responses (e.g., vowels vs. consonants) Generalization: 78% accuracy on continuation of training text Damaging network produced graceful degradation, with rapid recovery on retraining

33

34

35


Download ppt "Biological sequence analysis and information processing by artificial neural networks Søren Brunak Center for Biological Sequence Analysis Technical University."

Similar presentations


Ads by Google