Artificial Neural Networks 2 Morten Nielsen Depertment of Systems Biology, DTU.

Slides:

Advertisements

Similar presentations

NEURAL NETWORKS Backpropagation Algorithm

Advertisements

Neural networks Introduction Fitting neural networks

Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.

Stabilization matrix method (Rigde regression) Morten Nielsen Department of Systems Biology, DTU.

Artificial Neural Networks 2 Morten Nielsen BioSys, DTU.

Optimization methods Morten Nielsen Department of Systems Biology, DTU.

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Optimization methods Morten Nielsen Department of Systems biology, DTU.

Biological sequence analysis and information processing by artificial neural networks.

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Biological sequence analysis and information processing by artificial neural networks Morten Nielsen CBS.

Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen.

Back-Propagation Algorithm

Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.

Biological sequence analysis and information processing by artificial neural networks.

Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)

Artificial Neural Networks

Introduction to Neural Networks John Paxton Montana State University Summer 2003.

General Mining Issues a.j.m.m. (ton) weijters Overfitting Noise and Overfitting Quality of mined models (some figures are based on the ML-introduction.

CS 484 – Artificial Intelligence

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Artificial Neural Networks

Classification Part 3: Artificial Neural Networks

Chapter 3 Neural Network Xiu-jun GONG (Ph. D) School of Computer Science and Technology, Tianjin University

11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering

Introduction to Artificial Neural Network Models Angshuman Saha Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.

Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.

Sequence encoding, Cross Validation Morten Nielsen BioSys, DTU

A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -

Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.

Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU IIB-INTECH, UNSAM, Argentina.

Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.

Non-Bayes classifiers. Linear discriminants, neural networks.

Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.

Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:

CS621 : Artificial Intelligence

Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.

CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 21: Perceptron training and convergence.

Artificiel Neural Networks 2 Morten Nielsen Department of Systems Biology, DTU.

Image Source: ww.physiol.ucl.ac.uk/fedwards/ ca1%20neuron.jpg

Optimization methods Morten Nielsen Department of Systems biology, DTU IIB-INTECH, UNSAM, Argentina.

Introduction to Neural Networks Freek Stulp. 2 Overview Biological Background Artificial Neuron Classes of Neural Networks 1. Perceptrons 2. Multi-Layered.

Stabilization matrix method (Ridge regression) Morten Nielsen Department of Systems Biology, DTU.

Prediction of T cell epitopes using artificial neural networks Morten Nielsen, CBS, BioCentrum, DTU.

Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.

Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.

語音訊號處理之初步實驗 NTU Speech Lab 指導教授: 李琳山助教: 熊信寬

Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.

Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.

Supervised Learning in ANNs

第 3 章神经网络.

Prof. Carolina Ruiz Department of Computer Science

Machine Learning Today: Reading: Maria Florina Balcan

of the Artificial Neural Networks.

Artificial Intelligence Chapter 3 Neural Networks

network of simple neuron-like computing elements

Neural Networks Geoff Hulten.

Artificial Intelligence Chapter 3 Neural Networks

Overfitting and Underfitting

Artificial Intelligence Chapter 3 Neural Networks

Neural networks (1) Traditional multi-layer perceptrons

Artificial Intelligence 10. Neural Networks

Artificial Intelligence Chapter 3 Neural Networks

Seminar on Machine Learning Rada Mihalcea

David Kauchak CS158 – Spring 2019

Practical session on neural network modelling

Artificial Intelligence Chapter 3 Neural Networks

Prof. Carolina Ruiz Department of Computer Science

Presentation transcript:

Artificial Neural Networks 2 Morten Nielsen Depertment of Systems Biology, DTU

Outline Optimization procedures –Gradient decent (this you already know) Network training –back propagation –cross-validation –Over-fitting –examples

Neural network. Error estimate I1I1 I2I2 w1w1 w2w2 Linear function o

Neural networks

Gradient decent (from wekipedia) Gradient descent is based on the observation that if the real-valued function F(x) is defined and differentiable in a neighborhood of a point a, then F(x) decreases fastest if one goes from a in the direction of the negative gradient of F at a. It follows that, if for  > 0 a small enough number, then F(b)<F(a)

Gradient decent (example)

Gradient decent. Example Weights are changed in the opposite direction of the gradient of the error I1I1 I2I2 w1w1 w2w2 Linear function o

What about the hidden layer?

Hidden to output layer

Input to hidden layer

Summary

Or

I i =X[0][k] H j =X[1][j] O i =X[2][i]

Can you do it your self? v 22 =1 v 12 =1 v 11 =1 v 21 =-1 w 1 =-1 w 2 =1 h2H2h2H2 h1H1h1H1 oOoO I 1 =1I 2 =1 What is the output (O) from the network? What are the  w ij and  v jk values if the target value is 0 and  =0.5?

Can you do it your self (  =0.5). Has the error decreased? v 22 =1 v 12 =1 v 11 =1 v 21 =-1 w 1 =-1 w 2 =1 h2=H2=h2=H2= h 1= H 1 = o= O= I 1 =1I 2 =1 v 22 =. v 12 = V 11 = v 21 = w1=w1= w2=w2= h2=H2=h2=H2= h1=H1=h1=H1= o= O= I 1 =1I 2 =1 Before After

Sequence encoding Change in weight is linearly dependent on input value “True” sparse encoding is therefore highly inefficient Sparse is most often encoded as –+1/-1 or 0.9/0.05

Training and error reduction 



 Size matters

A Network contains a very large set of parameters –A network with 5 hidden neurons predicting binding for 9meric peptides has more than 9x20x5=900 weights Over fitting is a problem Stop training when test performance is optimal Neural network training years Temperature

What is going on? years Temperature  

Examples Train on 500 A0201 and 60 A0101 binding data Evaluate on 1266 A0201 peptides NH=1: PCC = 0.77 NH=5: PCC = 0.72

Neural network training. Cross validation Cross validation Train on 4/5 of data Test on 1/5 => Produce 5 different neural networks each with a different prediction focus

Neural network training curve Maximum test set performance Most cable of generalizing

5 fold training Which network to choose?

5 fold training

How many folds? Cross validation is always good!, but how many folds? –Few folds -> small training data sets –Many folds -> small test data sets Example from Tuesdays exercise –560 peptides for training 50 fold (10 peptides per test set, few data to stop training) 2 fold (280 peptides per test set, few data to train) 5 fold (110 peptide per test set, 450 per training set)

Problems with 5fold cross validation Use test set to stop training, and test set performance to evaluate training –Over-fitting? If test set is small yes If test set is large no Confirm using “true” 5 fold cross validation –1/5 for evaluation –4/5 for 4 fold cross-validation

Conventional 5 fold cross validation

“True” 5 fold cross validation

When to be careful When data is scarce, the performance obtained used “conventional” versus “true” cross validation can be very large When data is abundant the difference is small, and “true” cross validation might even be higher than “conventional” cross validation due to the ensemble aspect of the “true” cross validation approach

Do hidden neurons matter? The environment matters NetMHCpan

Context matters FMIDWILDA YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.89 A0201 FMIDWILDA YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.08 A0101 DSDGSFFLY YFAMYGEKVAHTHVDTLYVRYHYYTWAVLAYTWY 0.08 A0201 DSDGSFFLY YFAMYQENMAHTDANTLYIIYRDYTWVARVYRGY 0.85 A0101

Example

Summary Gradient decent is used to determine the updates for the synapses in the neural network Some relatively simple math defines the gradients –Networks without hidden layers can be solved on the back of an envelope (SMM exercise) –Hidden layers are a bit more complex, but still ok Always train networks using a test set to stop training –Be careful when reporting predictive performance Use “true” cross-validation for small data sets And hidden neurons do matter (sometimes)

And some more stuff for the long cold winter nights Can it might be made differently?

Predicting accuracy Can it be made differently? Reliability

Identification of position specific receptor ligand interactions by use of artificial neural network decomposition. An investigation of interactions in the MHC:peptide system Master these by Frederik Otzen Bagger Making sense of ANN weights