Presentation is loading. Please wait.

Presentation is loading. Please wait.

Biological sequence analysis and information processing by artificial neural networks.

Similar presentations


Presentation on theme: "Biological sequence analysis and information processing by artificial neural networks."— Presentation transcript:

1 Biological sequence analysis and information processing by artificial neural networks

2 Pairvise alignment >carp Cyprinus carpio growth hormone 210 aa vs. >chicken Gallus gallus growth hormone 216 aa scoring matrix: BLOSUM50, gap penalties: -12/-2 40.6% identity; Global alignment score: 487 10 20 30 40 50 60 70 carp MA--RVLVLLSVVLVSLLVNQGRASDN-----QRLFNNAVIRVQHLHQLAAKMINDFEDSLLPEERRQLSKIFPLSFCNSD ::. :...:.:. : :.. :: :::.:.:::: :::...::..::..:.:.:: :. chicken MAPGSWFSPLLIAVVTLGLPQEAAATFPAMPLSNLFANAVLRAQHLHLLAAETYKEFERTYIPEDQRYTNKNSQAAFCYSE 10 20 30 40 50 60 70 80 80 90 100 110 120 130 140 150 carp YIEAPAGKDETQKSSMLKLLRISFHLIESWEFPSQSLSGTVSNSLTVGNPNQLTEKLADLKMGISVLIQACLDGQPNMDDN : ::.:::..:..:..:::.:. ::.:: : : ::..:.:. :.... ::: ::. ::..:.. :.:. chicken TIPAPTGKDDAQQKSDMELLRFSLVLIQSWLTPVQYLSKVFTNNLVFGTSDRVFEKLKDLEEGIQALMRELEDRSPR---G 90 100 110 120 130 140 150 160 170 180 190 200 210 carp DSLPLP-FEDFYLTM-GENNLRESFRLLACFKKDMHKVETYLRVANCRRSLDSNCTL.: :.. :...:. :... ::.:::::.:::::::.:.:::.::::. chicken PQLLRPTYDKFDIHLRNEDALLKNYGLLSCFKKDLHKVETYLKVMKCRRFGESNCTI 170 180 190 200 210

3

4

5

6

7 Biological Neural network

8 Biological neuron

9 Diversity of interactions in a network enables complex calculations Similar in biological and artificial systems Excitatory (+) and inhibitory (-) relations between compute units

10 Biological neuron structure

11 Transfer of biological principles to artificial neural network algorithms Non-linear relation between input and output Massively parallel information processing Data-driven construction of algorithms Ability to generalize to new data items

12

13

14 Simplest non-trivial classification problem CNHSYYP, HIETRRA, NWQSADY, NQYSEPR, WHITRCA, DYHSANY,... Two categories: positives and negatives Data described by features, e.g. charge, sidechain volume, molecular weight, number of atoms,...

15 Features of phosphorylations sites PKG cGMP- dep.kinase PKC CaM-II Ca++/cal- modulin-dep. kinase cdc2 Cyclin- dep.kinase 2 CK-II Casein kinase 2

16

17

18 Neural networks Neural networks can learn higher order correlations XOR function: 0 0 => 0 1 0 => 1 0 1 => 1 1 1 => 0 (1,1) (1,0) (0,0) (0,1) No linear function can separate the points

19 Neural networks v1v1 v2v2 Linear function

20 Neural networks w 11 w 12 v1v1 w 21 w 22 v2v2 Higher order function

21 Neural networks. How does it work? w 12 v1v1 w 21 w 22 v2v2 w t2 w t1 w 11 vtvt Input 1 (Bias) {

22 Neural networks (0 0) 6 -9 4 6 9 -2 -6 4 -4.5 Input 1 (Bias) { o 1 =-6 O 1 =0 o 2 =-2 O 2 =0 y 1 =-4.5 Y 1 =0

23 Neural networks (1 0 && 0 1) 6 -9 4 6 9 -2 -6 4 -4.5 Input 1 (Bias) { o 1 =-2 O 1 =0 o 2 =4 O 2 =1 y 1 =4.5 Y 1 =1

24 Neural networks (1 1) 6 -9 4 6 9 -2 -6 4 -4.5 Input 1 (Bias) { o 1 =2 O 1 =1 o 2 =10 O 2 =1 y 1 =-4.5 Y 1 =0

25 What is going on? XOR function: 0 0 => 0 1 0 => 1 0 1 => 1 1 1 => 0 6 -9 4 6 9 -2 -6 4 -4.5 Input 1 (Bias) { y2y2 y1y1

26 What is going on? (1,1) (1,0) (0,0) (0,1) x2x2 x1x1 y1y1 y2y2 (1,0) (2,2) (0,0)

27

28

29 DEMO

30

31 Training and error reduction

32 Transfer of biological principles to neural network algorithms Non-linear relation between input and output Massively parallel information processing Data-driven construction of algorithms

33 A Network contains a very large set of parameters –A network with 5 hidden neurons predicting binding for 9meric peptides has 9x20x5=900 weights Over fitting is a problem Stop training when test performance is optimal Neural network training years Temperature

34 Neural network training. Cross validation Cross validation Train on 4/5 of data Test on 1/5 => Produce 5 different neural networks each with a different prediction focus

35 Neural network training curve Maximum test set performance Most cable of generalizing

36 Network training Encoding of sequence data Sparse encoding Blosum encoding Sequence profile encoding

37 Sparse encoding of amino acid sequence windows

38 Sparse encoding Inp Neuron 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 AAcid A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 R 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 N 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 D 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Q 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 E 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0

39 Sparse encoding of nucleotide sequence windows Nucleotides 4 letter alphabet ACGTAGGCAATCTCAGACGTTTATC 1000010000100001100000100010010010001000000101000001010010000010100001000010000100010001100000010100

40 BLOSUM encoding (Blosum50 matrix) A R N D C Q E G H I L K M F P S T W Y V A 4 -1 -2 -2 0 -1 -1 0 -2 -1 -1 -1 -1 -2 -1 1 0 -3 -2 0 R -1 5 0 -2 -3 1 0 -2 0 -3 -2 2 -1 -3 -2 -1 -1 -3 -2 -3 N -2 0 6 1 -3 0 0 0 1 -3 -3 0 -2 -3 -2 1 0 -4 -2 -3 D -2 -2 1 6 -3 0 2 -1 -1 -3 -4 -1 -3 -3 -1 0 -1 -4 -3 -3 C 0 -3 -3 -3 9 -3 -4 -3 -3 -1 -1 -3 -1 -2 -3 -1 -1 -2 -2 -1 Q -1 1 0 0 -3 5 2 -2 0 -3 -2 1 0 -3 -1 0 -1 -2 -1 -2 E -1 0 0 2 -4 2 5 -2 0 -3 -3 1 -2 -3 -1 0 -1 -3 -2 -2 G 0 -2 0 -1 -3 -2 -2 6 -2 -4 -4 -2 -3 -3 -2 0 -2 -2 -3 -3 H -2 0 1 -1 -3 0 0 -2 8 -3 -3 -1 -2 -1 -2 -1 -2 -2 2 -3 I -1 -3 -3 -3 -1 -3 -3 -4 -3 4 2 -3 1 0 -3 -2 -1 -3 -1 3 L -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 K -1 2 0 -1 -3 1 1 -2 -1 -3 -2 5 -1 -3 -1 0 -1 -3 -2 -2 M -1 -1 -2 -3 -1 0 -2 -3 -2 1 2 -1 5 0 -2 -1 -1 -1 -1 1 F -2 -3 -3 -3 -2 -3 -3 -3 -1 0 0 -3 0 6 -4 -2 -2 1 3 -1 P -1 -2 -2 -1 -3 -1 -1 -2 -2 -3 -3 -1 -2 -4 7 -1 -1 -4 -3 -2 S 1 -1 1 0 -1 0 0 0 -1 -2 -2 0 -1 -2 -1 4 1 -3 -2 -2 T 0 -1 0 -1 -1 -1 -1 -2 -2 -1 -1 -1 -1 -2 -1 1 5 -2 -2 0 W -3 -3 -4 -4 -2 -2 -3 -2 -2 -3 -2 -3 -1 1 -4 -3 -2 11 2 -3 Y -2 -2 -2 -3 -2 -1 -2 -3 2 -1 -1 -2 -1 3 -3 -2 -2 2 7 -1 V 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4

41 Sequence encoding (continued) Sparse encoding V:0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 L:0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 V. L=0 (unrelated) Blosum encoding V: 0 -3 -3 -3 -1 -2 -2 -3 -3 3 1 -2 1 -1 -2 -2 0 -3 -1 4 L: -1 -2 -3 -4 -1 -2 -3 -4 -3 2 4 -2 2 0 -3 -2 -1 -2 -1 1 V. L = 0.88 (highly related) V. R = -0.08 (close to unrelated)

42 Applications of artificial neural networks Talk recognition Prediction of protein secondary structure Prediction of Signal peptides Post translation modifications Glycosylation Phosphorylation Proteasomal cleavage MHC:peptide binding

43

44

45


Download ppt "Biological sequence analysis and information processing by artificial neural networks."

Similar presentations


Ads by Google