Presentation is loading. Please wait.

Presentation is loading. Please wait.

ECE 5424: Introduction to Machine Learning

Similar presentations


Presentation on theme: "ECE 5424: Introduction to Machine Learning"— Presentation transcript:

1 ECE 5424: Introduction to Machine Learning
Topics: Neural Networks Backprop Readings: Murphy 16.5 Stefan Lee Virginia Tech

2 Recap of Last Time (C) Dhruv Batra

3 Not linearly separable data
Some datasets are not linearly separable!

4 Addressing non-linearly separable data – Option 1, non-linear features
Choose non-linear features, e.g., Typical linear features: w0 + i wi xi Example of non-linear features: Degree 2 polynomials, w0 + i wi xi + ij wij xi xj Classifier hw(x) still linear in parameters w As easy to learn Data is linearly separable in higher dimensional spaces Express via kernels (C) Dhruv Batra Slide Credit: Carlos Guestrin

5 Addressing non-linearly separable data – Option 2, non-linear classifier
Choose a classifier hw(x) that is non-linear in parameters w, e.g., Decision trees, neural networks,… More general than linear classifiers But, can often be harder to learn (non-convex optimization required) Often very useful (outperforms linear classifiers) In a way, both ideas are related (C) Dhruv Batra Slide Credit: Carlos Guestrin

6 Biological Neuron (C) Dhruv Batra

7 Recall: The Neuron Metaphor
Neurons accept information from multiple inputs, transmit information to other neurons. Multiply inputs by weights along edges Apply some function to the set of inputs at each node Slide Credit: HKUST

8 Types of Neurons Slide Credit: HKUST Linear Neuron Logistic Neuron
Potentially more. Require a convex loss function for gradient descent training. Perceptron Slide Credit: HKUST

9 Limitation A single “neuron” is still a linear decision boundary
What to do? Idea: Stack a bunch of them together! (C) Dhruv Batra

10 Multilayer Networks Cascade Neurons together
The output from one layer is the input to the next Each Layer has its own sets of weights Slide Credit: HKUST

11 Universal Function Approximators
Theorem 3-layer network with linear outputs can uniformly approximate any continuous function to arbitrary accuracy, given enough hidden units [Funahashi ’89] (C) Dhruv Batra

12 Plan for Today Neural Networks Parameter learning Backpropagation
(C) Dhruv Batra

13 Forward Propagation On board (C) Dhruv Batra

14 Feed-Forward Networks
Predictions are fed forward through the network to classify Slide Credit: HKUST

15 Feed-Forward Networks
Predictions are fed forward through the network to classify Slide Credit: HKUST

16 Feed-Forward Networks
Predictions are fed forward through the network to classify Slide Credit: HKUST

17 Feed-Forward Networks
Predictions are fed forward through the network to classify Slide Credit: HKUST

18 Feed-Forward Networks
Predictions are fed forward through the network to classify Slide Credit: HKUST

19 Feed-Forward Networks
Predictions are fed forward through the network to classify Slide Credit: HKUST

20 Gradient Computation First let’s try:
Single Neuron for Linear Regression (C) Dhruv Batra

21 Gradient Computation First let’s try: Now let’s try the general case
Single Neuron for Linear Regression Now let’s try the general case Backpropagation! Really efficient (C) Dhruv Batra

22 Neural Nets Best performers on OCR NetTalk Rick Rashid speaks Mandarin
NetTalk Text to Speech system from 1987 Rick Rashid speaks Mandarin (C) Dhruv Batra

23 Historical Perspective
(C) Dhruv Batra

24 Convergence of backprop
Perceptron leads to convex optimization Gradient descent reaches global minima Multilayer neural nets not convex Gradient descent gets stuck in local minima Hard to set learning rate Selecting number of hidden units and layers = fuzzy process NNs had fallen out of fashion in 90s, early 2000s Back with a new name and significantly improved performance!!!! Deep networks Dropout and trained on much larger corpus (C) Dhruv Batra Slide Credit: Carlos Guestrin

25 Overfitting Many many many parameters Avoiding overfitting?
More training data Regularization Early stopping (C) Dhruv Batra

26 A quick note (C) Dhruv Batra Image Credit: LeCun et al. ‘98

27 Rectified Linear Units (ReLU)
(C) Dhruv Batra

28 Convolutional Nets Basic Idea On board Assumptions:
Local Receptive Fields Weight Sharing / Translational Invariance / Stationarity Each layer is just a convolution! (C) Dhruv Batra Image Credit: Chris Bishop

29 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

30 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

31 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

32 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

33 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

34 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

35 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

36 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

37 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

38 Convolutional Nets Example:
(C) Dhruv Batra Image Credit: Yann LeCun, Kevin Murphy

39 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

40 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

41 (C) Dhruv Batra Slide Credit: Marc'Aurelio Ranzato

42 Visualizing Learned Filters
(C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14]

43 Visualizing Learned Filters
(C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14]

44 Visualizing Learned Filters
(C) Dhruv Batra Figure Credit: [Zeiler & Fergus ECCV14]

45 Autoencoders Goal Compression: Output tries to predict input
(C) Dhruv Batra Image Credit:

46 Autoencoders Goal Learns a low-dimensional “basis” for the data
(C) Dhruv Batra Image Credit: Andrew Ng

47 Stacked Autoencoders How about we compress the low-dim features more?
(C) Dhruv Batra Image Credit:

48 Figure courtesy: Quoc Le
Sparse DBNs [Lee et al. ICML ‘09] Figure courtesy: Quoc Le (C) Dhruv Batra

49 Stacked Autoencoders Finally perform classification with these low-dim features. (C) Dhruv Batra Image Credit:

50 What you need to know about neural networks
Perceptron: Representation Derivation Multilayer neural nets Derivation of backprop Learning rule Expressive power


Download ppt "ECE 5424: Introduction to Machine Learning"

Similar presentations


Ads by Google