Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

NEURAL NETWORKS Backpropagation Algorithm
Neural networks Introduction Fitting neural networks
Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Artificial Intelligence 13. Multi-Layer ANNs Course V231 Department of Computing Imperial College © Simon Colton.
Chapter 8 – Logistic Regression
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Machine Learning Neural Networks
Lecture 14 – Neural Networks
Chapter 7 – K-Nearest-Neighbor
The back-propagation training algorithm
Connectionist models. Connectionist Models Motivated by Brain rather than Mind –A large number of very simple processing elements –A large number of weighted.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Neural Networks. R & G Chapter Feed-Forward Neural Networks otherwise known as The Multi-layer Perceptron or The Back-Propagation Neural Network.
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
Introduction to Directed Data Mining: Neural Networks
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Radial Basis Function Networks
Neural networks.
Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Neural Networks Prepared by David Douglas, University of ArkansasHosted.
CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:
Overview DM for Business Intelligence.
Classification Part 3: Artificial Neural Networks
Artificial Neural Networks
Using Neural Networks in Database Mining Tino Jimenez CS157B MW 9-10:15 February 19, 2009.
Chapter 9 Neural Network.
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 23 Nov 2, 2005 Nanjing University of Science & Technology.
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Chapter 9 – Classification and Regression Trees
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
NEURAL NETWORKS FOR DATA MINING
Chapter 7 Neural Networks in Data Mining Automatic Model Building (Machine Learning) Artificial Intelligence.
 Diagram of a Neuron  The Simple Perceptron  Multilayer Neural Network  What is Hidden Layer?  Why do we Need a Hidden Layer?  How do Multilayer.
Classification / Regression Neural Networks 2
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
BACKPROPAGATION: An Example of Supervised Learning One useful network is feed-forward network (often trained using the backpropagation algorithm) called.
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Neural Network and Deep Learning 王强昌 MLA lab.
CHEE825 Fall 2005J. McLellan1 Nonlinear Empirical Models.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Kim HS Introduction considering that the amount of MRI data to analyze in present-day clinical trials is often on the order of hundreds or.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Business Intelligence and Decision Support Systems (9 th Ed., Prentice Hall) Chapter 6: Artificial Neural Networks for Data Mining.
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
Multinomial Regression and the Softmax Activation Function Gary Cottrell.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Artificial Neural Networks
Other Classification Models: Neural Network
Learning with Perceptrons and Neural Networks
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
Neural Networks Geoff Hulten.
Neural networks (1) Traditional multi-layer perceptrons
Presentation transcript:

Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce

Basic Idea Mimic learning that takes place in human brain Combine input information in a complex & flexible neural net “model” Model “coefficients” are continually tweaked in an iterative process The network’s interim performance in classification and prediction informs successive tweaks

Network Structure Multiple layers Input layer (raw observations) Hidden layers Output layer Nodes Weights (like coefficients, subject to iterative adjustment) Bias values (also like coefficients, but not subject to iterative adjustment)

“Feed-Forward” Model

Example – Using fat & salt content to predict consumer acceptance of cheese

Example - Data

Moving Through the Network

The Input Layer For input layer, input = output E.g., for record #1: Fat input = output = 0.2 Salt input = output = 0.9 Output of input layer = input into hidden layer

The Hidden Layer In this example, hidden layer has 3 nodes Each node receives as input the output of all input nodes Output of each hidden node is a function of the weighted sum of inputs

The Weights The Bias value (  - theta) and Weights (w) are typically initialized to random values in the range to Equivalent to a model with random prediction (in other words, no predictive value) These initial weights are used in the first round of training Weights are adjusted at end based on Error Scores (i.e., Back Propagation)

Initial Pass of the Network

Output of Node 3, if g is a Logistic Function Euler’s Logarithmic Function (from Swiss Mathematician Leonhard Euler) Equal to Bias Value (random) Initial Weights (random) Input Values

Output Layer The output of the last hidden layer becomes input for the output layer Uses same function as above, i.e. a function g of the weighted average Input Values (Output from last node) Euler’s Logarithmic Function Bias Value (random) Initial Weights (random)

The output node

Mapping the output to a classification Output = If cutoff for a “1” is 0.5, then we classify as “1”

Relation to Linear Regression A net with a single output node and no hidden layers, where g is the identity function, takes the same form as a linear regression model

Training the Model

Preprocessing Steps Scale variables to 0-1 Categorical variables If equidistant categories, map to equidistant interval points in 0-1 range Otherwise, create dummy variables Transform (e.g., log) skewed variables

Initial Pass Through Network Goal: Find weights that yield best predictions The process we described above is repeated for all records At each record, compare prediction to actual Difference is the error for the output node Error is propagated back and distributed to all the hidden nodes and used to update their weights

Back Propagation (“back-prop”) Output from output node k: Error associated with that node: Note: this is like ordinary error, multiplied by a correction factor

Adjusting Weights based on Error Error = (1 – 0.506) (1 – 0.506) Error = “Actual” Output (from Training Data)

Case Updating Weights are updated after each record is run through the network Completion of all records through the network is one epoch (also called sweep or iteration) After one epoch is completed, return to first record and repeat the process

Error is Used to Update Bias Value and Weights l = constant between 0 and 1, reflects the “learning rate” or “weight decay parameter” Old Weights Learning Rate (constant) Error

Updating Weights Feeding Node 6 (Output) 6 = (.5) (.123) =.047 w 3,6 =.01 + (.5) (.123) =.072 w 4,6 =.05 + (.5) (.123) =.112 w 5,6 = (.5) (.123) =.077

Batch Updating All records in the training set are fed to the network before updating takes place In this case, the error used for updating is the sum of all errors from all records

Why It Works Big errors lead to big changes in weights Small errors leave weights relatively unchanged Over thousands of updates, a given weight keeps changing until the error associated with that weight is negligible, at which point weights change little

Common Criteria to Stop the Updating When weights change very little from one iteration to the next When the misclassification rate reaches a required threshold When a limit on runs is reached

Fat/Salt Example: Final Weights

XLMiner Output: Final Weights Note: XLMiner uses two output nodes (P[1] and P[0]); diagrams show just one output node (P[1])

XLMiner: Final Classifications XLMiner did not do very well ! Why? 6 observations is not enough to train model

3 Formulas Involved in Neural Networks 1.Calculate the output of a node (p. 226) 2.Calculate the output error in comparison to Training data (p. 229) 3.Calculate the new bias value and weight values (p. 229) “Actual” result Input Values Feed Forward Back Propagation

Avoiding Overfitting With sufficient iterations, neural net can easily overfit the data To avoid overfitting: Track error in validation data Limit iterations Limit complexity of network

User Inputs

Specify Network Architecture Number of hidden layers Most popular – one hidden layer Number of nodes in hidden layer(s) More nodes capture complexity, but increase chances of overfit Number of output nodes For classification, one node per class (in binary case can also use one) For numerical prediction use one

Network Architecture, cont. “Learning Rate” (l) Called Gradient Step Size in XLMiner (range = 0 to 1) Low values “downweight” the new information from errors at each iteration Low values slow learning, but reduce tendency to overfit to local structure “Momentum” High values keep weights changing in same direction as previous iteration Likewise, this helps avoid overfitting to local structure, but also slows learning

Automation Some software automates the optimal selection of input parameters XLMiner automates # of hidden layers and number of nodes

Advantages Good predictive ability Can capture complex relationships No need to specify a model

Disadvantages Considered a “black box” prediction machine, with no insight into relationships between predictors and outcome No variable-selection mechanism, so you have to exercise care in selecting variables Heavy computational requirements if there are many variables (additional variables dramatically increase the number of weights to calculate)

Summary Neural networks can be used for classification and prediction Can capture a very flexible/complicated relationship between the outcome and a set of predictors The network “learns” and updates its model iteratively as more data are fed into it Major danger: overfitting Requires large amounts of data Good predictive performance, yet “black box” in nature