Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship.

Slides:



Advertisements
Similar presentations
Neural Networks and Kernel Methods
Advertisements

Neural networks Introduction Fitting neural networks
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Artificial Intelligence Lecture 2 Dr. Bo Yuan, Professor Department of Computer Science and Engineering Shanghai Jiaotong University
Machine Learning Neural Networks
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Lecture 14 – Neural Networks
Data mining and statistical learning - lab2-4 Lab 2, assignment 1: OLS regression of electricity consumption on temperature at 53 sites.
Data Mining Techniques Outline
Radial Basis Functions
Statistics 350 Lecture 16. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Data mining and statistical learning, lecture 4 Outline Regression on a large number of correlated inputs  A few comments about shrinkage methods, such.
Tree-based methods, neutral networks
Data mining and statistical learning, lecture 5 Outline  Summary of regressions on correlated inputs  Ridge regression  PCR (principal components regression)
Giansalvo EXIN Cirrincione unit #7/8 ERROR FUNCTIONS part one Goal for REGRESSION: to model the conditional distribution of the output variables, conditioned.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Data mining and statistical learning, lecture 3 Outline  Ordinary least squares regression  Ridge regression.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Data mining and statistical learning - lecture 12 Neural networks (NN) and Multivariate Adaptive Regression Splines (MARS)  Different types of neural.
Simple Linear Regression Analysis
Classification and Prediction: Regression Analysis
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Classification Part 3: Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Outline What Neural Networks are and why they are desirable Historical background Applications Strengths neural networks and advantages Status N.N and.
Managerial Economics Demand Estimation. Scatter Diagram Regression Analysis.
Using Neural Networks to Predict Claim Duration in the Presence of Right Censoring and Covariates David Speights Senior Research Statistician HNC Insurance.
5.2 Input Selection 5.3 Stopped Training
Classification / Regression Neural Networks 2
MTH 161: Introduction To Statistics
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
M Machine Learning F# and Accord.net. Alena Dzenisenka Software architect at Luxoft Poland Member of F# Software Foundation Board of Trustees Researcher.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Insight: Steal from Existing Supervised Learning Methods! Training = {X,Y} Error = target output – actual output.
Linear Regression Basics III Violating Assumptions Fin250f: Lecture 7.2 Spring 2010 Brooks, chapter 4(skim) 4.1-2, 4.4, 4.5, 4.7,
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
CSC321: Lecture 7:Ways to prevent overfitting
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Psychology 202a Advanced Psychological Statistics October 22, 2015.
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
Brain-Machine Interface (BMI) System Identification Siddharth Dangi and Suraj Gowda BMIs decode neural activity into control signals for prosthetic limbs.
ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
PREDICT 422: Practical Machine Learning
Overfitting, Bias/Variance tradeoff. 2 Content of the presentation Bias and variance definitions Parameters that influence bias and variance Bias and.
Bump Hunting The objective PRIM algorithm Beam search References: Feelders, A.J. (2002). Rule induction by bump hunting. In J. Meij (Ed.), Dealing with.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Chapter 11 – Neural Nets © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.
Chapter 7. Classification and Prediction
Deep Feedforward Networks
Boosting and Additive Trees (2)
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Simple Linear Regression
Collaborative Filtering Matrix Factorization Approach
Neural Networks Geoff Hulten.
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Neural networks (1) Traditional multi-layer perceptrons
Artificial Intelligence 10. Neural Networks
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Introduction to Neural Networks
Presentation transcript:

Data mining and statistical learning - lecture 11 Neural networks - a model class providing a joint framework for prediction and classification  Relationship to other prediction models  Some simple examples of neural networks  Parameter estimation  Joint framework for prediction and classification  Features of neural networks

Data mining and statistical learning - lecture 11 Ordinary least squares regression (OLS) x1x1 x2x2 xpxp … y Model: Terminology:  0 : intercept (or bias)  1, …,  p : regression coefficients (or weights) The response variable responds directly and linearly to changes in the inputs

Data mining and statistical learning - lecture 11 Principal components regression (PCR) Extract principal components (linear combinations of the inputs) as derived features, and then model the target (response) as a linear function of these features x1x1 x2x2 xpxp z1z1 z2z2 zMzM … … y The response variable responds indirectly and linearly to changes in the inputs

Data mining and statistical learning - lecture 11 Neural network with a single target Output x1x1 x2x2 xpxp z1z1 z2z2 zMzM … … y Hidden layer of neurons Inputs The response to changes in inputs is indirect and nonlinear

Data mining and statistical learning - lecture 11 Neuron Sigmoid activation function

Data mining and statistical learning - lecture 11 Neural networks with a single target Extract linear combinations of the inputs as derived features, and then model the target (response) as a linear function of a sigmoid function (activation function) of these features x1x1 x2x2 xpxp z1z1 z2z2 zMzM … … y

Data mining and statistical learning - lecture 11 Neural network with one input, one neuron, and one target x z y

Data mining and statistical learning - lecture 11 Neural network with one input, one neuron, and one target x z y

Data mining and statistical learning - lecture 11 Neural network with one input, one neuron, and one target - a simple example Select Advanced user interface Select 1 hidden node Tick Outputs from Training,…

Data mining and statistical learning - lecture 11 Neural network with one input, one neuron, and one target

Data mining and statistical learning - lecture 11 Output from proc Neural - one input, one neuron, one target Parameter Estimates Gradient Objective N Parameter Estimate Function 1 x_H BIAS_H H11_y E-8 4 BIAS_y E-8 Value of Objective Function = H11 = Hidden layer 1, neuron 1

Data mining and statistical learning - lecture 11 Neural network with one input, one neuron, and one target - manual calculation of predicted values Parameter Estimates Gradient Objective N Parameter Estimate Function 1 x_H BIAS_H H11_y E-8 4 BIAS_y E-8 Standardize x to mean zero and variance one Compute xstand*x_H11+BIAS_H11 Take tanh to compute z Compute z*H11_y+BIAS_y

Data mining and statistical learning - lecture 11 Neural networks with one input, two neurons, and one target x z1z1 z2z2 y

Data mining and statistical learning - lecture 11 Output from proc Neural - one input, two neurons, one target Parameter Estimates Gradient Objective N Parameter Estimate Function 1 x_H x_H BIAS_H BIAS_H H11_y H12_y BIAS_y Value of Objective Function =

Data mining and statistical learning - lecture 11 Absorbance records for ten samples of chopped meat 1 response variable (fat) 100 predictors (absorbance at 100 wavelengths or channels) The predictors are strongly correlated to each other

Data mining and statistical learning - lecture 11 Absorbance records for 215 samples of chopped meat The target is poorly correlated to each predictor

Data mining and statistical learning - lecture 11 Neural networks with a single target and many inputs - the fat content and absorbance dataset A total of (p+2)*3+1 parameters are estimated x1x1 x1x1 xpxp z1z1 z2z2 z3z3 … y

Data mining and statistical learning - lecture 11 Neural networks with a single target and many inputs - parameter estimates for a model with three neurons. 291 Channel90_H Channel91_H Channel92_H Channel93_H Channel94_H Channel95_H Channel96_H Channel97_H Channel98_H Channel99_H BIAS_H BIAS_H BIAS_H H11_Fat H12_Fat H13_Fat BIAS_Fat Value of Objective Function = A total of 307 parameters

Data mining and statistical learning - lecture 11 Neural networks with a single target and many inputs - output from a model with three neurons

Data mining and statistical learning - lecture 11 Neural networks with a single target and many inputs - output from models with 1 to 10 neurons Convergence problems

Data mining and statistical learning - lecture 11 Neural networks with multiple targets Extract linear combinations of the inputs as derived features, and then model the target (response) as a linear function of a sigmoid function (activation function) of these features x1x1 x2x2 xpxp z1z1 z2z2 zMzM … … y1y1 yKyK …

Data mining and statistical learning - lecture 11 Neural networks for K-class classification With the softmax activation function and the deviance (cross-entropy) error function the neural network model is exactly a logistic regression model in the hidden units, and all the parameters are estimated by maximum likelihood x1x1 x2x2 xpxp z1z1 z2z2 zMzM … … y1y1 yKyK …

Data mining and statistical learning - lecture 11 Neural networks for regression and K-class classification For regression, we use the sum-of- squared errors as our measure of fit For classification, we normally use the deviance (cross-entropy) error function and the corresponding classifier is. x1x1 x2x2 xpxp z1z1 z2z2 zMzM … … y1y1 yKyK …

Data mining and statistical learning - lecture 11 Fitting neural networks x1x1 x2x2 xpxp z1z1 z2z2 zMzM … … y1y1 yKyK … M(p+1)+K(M+1) parameters (weights) We don’t want the global minimizer of the deviance (cross-entropy) function. Instead we use early stopping or a penalty term

Data mining and statistical learning - lecture 11 Neural networks  Provide a joint framework for prediction and classification  Can describe both linear and nonlinear responses  Can accommodate multidimensional correlated inputs  Are normally over-fitted – validation is a must  Are difficult to interpret  Convergence problems are not uncommon

Data mining and statistical learning - lecture 11 Some characteristics of different learning methods CharacteristicNeural networksTrees Natural handling of data of “mixed” type Handling of missing values Robustness to outliers in input space Insensitive to monotone transformations of inputs Computational scalability (large N) Ability to deal with irrelevant inputs Ability to extract linear combinations of features GoodPoor InterpretabilityPoorFair/good Predictive powerGoodPoor