Download presentation
Presentation is loading. Please wait.
1
Predictive Analytics – Basics of Machine Learning
2
What Is Machine Learning?
Automating automation Getting computers to program themselves Writing software is the bottleneck Let the data do the work instead!
3
Traditional Programming
Machine Learning Computer Data Output Program Computer Data Program Output
4
Sample Applications Web search Computational biology Finance
E-commerce Space exploration Robotics Information extraction Social networks Debugging [Your favorite area]
5
ML in a Nutshell Tens of thousands of machine learning algorithms
Hundreds new every year Every machine learning algorithm has three components: Representation Evaluation Optimization
6
Representation Decision trees Sets of rules / Logic programs Instances
Graphical models (Bayes/Markov nets) Neural networks Support vector machines Model ensembles Etc.
7
Evaluation Accuracy Precision and recall Squared error Likelihood
Posterior probability Cost / Utility Margin Entropy K-L divergence Etc.
8
Optimization Combinatorial optimization Convex optimization
E.g.: Greedy search Convex optimization E.g.: Gradient descent Constrained optimization E.g.: Linear programming
9
Types of Learning Supervised (inductive) learning
Training data includes desired outputs Unsupervised learning Training data does not include desired outputs Semi-supervised learning Training data includes a few desired outputs Reinforcement learning Rewards from sequence of actions
10
ML in Practice Understanding domain, prior knowledge, and goals
Data integration, selection, cleaning, pre-processing, etc. Learning models Interpreting results Consolidating and deploying discovered knowledge Loop
11
Taxonomy of Machine Learning
Rote Learning (背誦式學習) Learning from Instruction (教導式學習) Learning by Analogy (類推式學習) Learning from Examples (例舉式學習) Supervised Learning (the most common type) Classification Regression Reinforcement Learning Unsupervised Learning Evolutionary Learning (learning with probability) Markov Chain (learning with probability) Hidden Markov model (learning with probability and time)
12
Rote Learning (背誦式學習) 最簡單的一種形式 不對輸入資料做任何處理 問題的解答被直接儲存起來,等到同樣的問題出現時再被取出
如果儲存太多例子,在其中尋找對應於新問題的解,將使程式的效率整個地降低下來 可使用索引(Indexing)技巧來簡化搜尋資料的動作 也可以用啟示的方法來限制儲存資料的數目 1950年代末期Samuel使用此型式來儲存西洋棋盤面狀況的CHECKER程式中已相當成熟
13
Learning from Instruction (教導式學習)
以外界教導者的意見來改進效能。 類似於大部份的正規教育方法。 最大難題 將人類的高階語言,翻譯成程式可用的內在知識結構 將此知識與現有的知識基底整合在一起,以做有效的應用 學習者需執行一些演繹的工作,不過大部份的責任在教導者這一方。 教導者需提供組織好的知識,使學習者的現有知識漸次增加。 需建立一可接受指示或意見,且能儲存和有效應用所得知識的系統。
14
Learning by Analogy (類推式學習)
基於兩者的類似之處,將現有的知識應用到新的問題上。 更動現有的知識以適應新的例子。 例子 我們可能告訴學生,網球的開球動作就像將釘子釘入牆上高處的某一點一樣。同樣的,一個類推式學習的系統,很可能被用來將一個既成的電腦程式,轉換成一個執行相關功能的程式,而後者並非原來就具備這些功能。 比起背誦式學習或教導式學習,類推式學習要求學習者具有更多的推論能力。
15
Learning from Examples
Given an example set of limited size, find a concise data description Supervised Learning (the most common type) Classification Regression Reinforcement Learning Unsupervised Learning Evolutionary Learning
16
Supervised Learning Classification Regression
To assign new inputs to one of a number of discrete classes or categories Ex: Assign a character bitmap the correct letter of the alphabet Other pattern recognition problems Regression The output space is formed by the values of continuous variables Predict the value of shares in the stock exchange market
17
Reinforcement Learning
How to map situations to actions, in order to maximize a given reward. The learning algorithm is not told which actions to take in a given situation The learner is assumed to gain information about actions taken by some reward not necessarily arriving immediately after the action is taken Ex: play chess Reward Winning losing
18
Unsupervised Learning
If the data is only a sample of objects without associated target values Clustering algorithm Given a training sample of objects with the aim of extracting some structure from them Ex: Identify indoor or outdoor images Dimensionality reduction method Represent high-dimensionality data in low-dimension space, trying to preserve the original information of data
19
Evolutionary Learning
Biological organisms adapt to improve their survival rates and chance of having offspring in their environment
20
Least Squares Learning Rule
21
A Beginning - Artificial Neuron
樹狀突起由其它的神經單元接收訊號。 當其所接受的脈動(Impulse) 超過某一特定的定限(Threshold),這個神經單元就會被點燃(Fire),並產生一個脈動傳遞到軸突。 軸突末端的分支稱為胞突纏絡(突觸) (Synapse),它是神經與神經的連絡點﹔它可以是抑制的或者是刺激的。抑制的胞狀纏絡會降低所傳送的脈動﹔刺激的細胞纏絡則會加強之。 Cel body:細胞主體 Dendrite:樹狀突起 Axon:軸突
22
Artificial neuron (perceptron)
Artificial neuron (two inputs case) w0 x0=1 Output function Activation function
23
Artificial Neuron Output function (Activation function)
Binary threshold function
24
Artificial Neuron Output function (Activation function)
Linear threshold function
25
Artificial Neuron Output function (Activation function)
Sigmoidal function
26
Artificial Neuron Output function (Activation function)
Gaussian function
27
Least squares learning rule - regression
Given Minimize the difference (mean square error) between the desired output and the actual output for each input vector
28
Least squares learning rule - regression
29
Least squares learning rule - regression
Let then
30
Least squares learning rule - regression
31
Least squares learning rule - regression
Let then 4-31
32
Least squares learning rule - regression
For real time learning
33
Least squares learning rule - regression
The modification depends on output error and inputs
34
Back-propagation Learning Rule
35
Back-propagation learning rule
Improve the learning efficiency w0 wn 1 … xn Bias term … xn
36
Back-propagation learning rule
37
Back-propagation learning rule
linear output function: Output layer Hidden layer
38
Back-propagation learning rule
Sigmoid output function: Output layer Hidden layer
39
Back-propagation learning rule
Output layer (Linear vs Sigmoid) Hidden layer (Linear vs Sigmoid)
40
Back-propagation learning rule
Apply the input vector to the input layer Calculate net-input of the input layer: Calculate the output of the output layer: Calculate the net-input of the output layer: Update the weights of the output layer: For linear output unit For sigmoid function : the learning-rate parameter (0<<1)
41
Back-propagation learning rule
Update the weights of the hidden layer: For linear output unit For sigmoid function If the error term is acceptably small, stop the training.
42
Back-propagation learning rule
Weights Be initialized to small Random values between 0.5. bias terms May improve the learning learning rate parameter A small number On the order of 0.05 to 0.25 To ensure settling to a solution.
43
Back-propagation learning rule – An Exercise
Exclusive-OR operation
44
Back-propagation learning rule – An Exercise
Initial weights and threshold levels are set randomly: w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 = -1.2, w45 = 1.1, q3 = 0.8, q4 = -0.1 and q5 = 0.3. y3 y4
45
Back-propagation learning rule – An Exercise
46
Back-propagation learning rule – An Exercise
47
Back-propagation learning rule – An Exercise
The training process is repeated until the sum of squared errors is less than
48
Back-propagation learning rule – An Exercise
Sum-Squared Error
49
Back-propagation learning rule – An Exercise
Final results
50
Competitive Learning
51
Competitive Learning Principle of Competitive Learning
52
Competitive Learning Discovery of significant patterns or invariants of the environment without the intervention(介入) of a teaching input self-reinforcing based on competition involve cooperation
53
Competitive Learning Example of CL
Three clusters of vectors (denoted by solid dots) distributed on the unit sphere. Initially randomized codebook vectors (crosses) move under influence of a competitive learning rule to approximate the centroids of the clusters. Competitive learning schemes use codebook vectors to approximate centroids of data clusters.
54
Competitive Learning Hebbian learning rule
signal Hebbian learning rule Competitive learning rule Linear competitive learning rule
55
Competitive Learning - Kohonen learning rule
Winner-take-all learning rule
56
Competitive Learning - Kohonen learning rule
About winner The more the weights of a node parallel to the input vector, the higher probability he wins
57
Competitive Learning - Kohonen learning rule
Example: a two-node Kohonen network training patterns training patterns in polar coordinate form normalized initial weights
58
Competitive Learning - Kohonen learning rule
59
Self-Organizing Feature Map
60
Self-Organizing Feature Map: Underlying Ideas
Unsupervised learning process. Clusters of neurons win the competition. Weights of winning neurons are adjusted to bring about a better response to the current input. Final weights specify clusters of network nodes that are topologically close. Correspondence between signal features and response locations on the map. Preserves the topology of the input.
61
Self-Organizing Feature Map: Underlying Ideas
Distance relations in high dimensional spaces should be approximated by the network as the distances in the two dimensional neuronal field: Input neurons should be exposed to a sufficient number of different inputs. Only the winning neuron and its neighbours adapt their connections. A similar weight update procedure is employed on neurons which comprise topologically related subsets. The resulting adjustment enhances the responses to the same or to a similar input that occurs subsequently
62
Self-Organizing Feature Map
Neighbourhood Computation Neighbourhood is a function of time: as epochs of training elapse, the neighbourhood shrinks. Neighbourhood Shapes
63
Self-Organizing Feature Map
Some Observations Ordering phase (initial period of adaptation): learning rate should be close to unity. Learning rate should be decreased linearly, exponentially or inversely with iteration over the first 1000 epochs while maintaining its value above 0.1. Convergence phase: learning rate should be maintained at around 0.01 for a large number of epochs. may typically run into many tens of thousands of epochs. During the ordering phase NkIJ shrinks linearly with k to finally include only a few neurons. During the convergence phase NkIJ may comprise only one or no neighbours.
64
Self-Organizing Feature Map - Simulation Example
The data employed in the experiment comprised 500 points distributed uniformly over the bipolar square [−1, 1] × [−1, 1]. The points thus describe a geometrically square topology. 88 planar array of neurons was applied.
65
Self-Organizing Feature Map - Simulation Example
66
Self-Organizing Feature Map - Simulation Example
Simulation Notes: Initial value of the neighbourhood radius r = 6. Neighbourhood is initially a square of width 12 centered around the winning neuron IJ Neighbourhood width contracts by 1 every 200 epochs. After 1000 epochs, neighbourhood radius maintained at 1. Neighbourhood radius can also be zero only the winning neuron updates its weights.
67
Evolutionary Learning
68
Calculus-based Search - Indirect method
Set the gradient of the objective function equal to 0 fig. 1.1
69
Calculus-based Search - Direct method
Move in the direction related to the local gradient hill-climbing
70
Calculus-based Search
Both methods are local in scope best in a neighborhood of the current point fig. 1.2
71
Calculus-based Search
Both dependent upon the existence of derivative
72
Calculus-based search
Real world search Calculus-based methods are insufficient
73
State Space Search - Breadth first search
First in First out
74
State Space Search - Breadth first search
75
State Space Search - Depth first search
Last in First out
76
State Space Search - Depth first search
77
State Space Search - Best first search
(First in Best out)
78
State Space Search - Best first search
79
The Goals of Optimization
Process improvement shortest path Optimum (destination) itself goal convergence
80
Random search Search and save the best Not randomized technique
Genetic Algorithms (GAs) use random choice as a tool in a directed search process
81
Characteristics of GAs - search information
Which is the best? Smallest, longest, fast, fittest, ... The best in the array (met elements) is always not the solution. plays a role to guide the search direction
82
Characteristics of GAs - search information
The search direction breadth first search the first element in the array depth first search the last element in the array best first search the best element in the array winner takes all neural network search the gradient calculation, gradient of the error function By GAs search, how to find the search direction?
83
Characteristics of GAs - search information
winner takes all winner group takes all Randomly exchange the bits of the strings Winner group The next search state
84
Characteristics of GAs
Work with a coding of the parameter set, not the parameters themselves Search from a population of points, not a single point Use objective function information, not derivatives or other auxiliary knowledge Use probabilistic transition rules, not deterministic rules
85
Characteristics of GAs - coding
Example: find the maximum of f(x) fig. 1.5 Code the parameter x as a finite length string How to code the parameter?
86
A Simple Genetic Algorithm - Initialization
Example 01101 11000 01000 10011 . String number dependents on the complexity of the problem
87
A Simple Genetic Algorithm - Reproduction
Coping strings according to their fitness value. String with higher value have a higher probability of offspring in the next generation. Example Table 1.1
88
A Simple Genetic Algorithm - Reproduction
fig. 1.7 Each string has a reproduction probability proportional to their corresponding area (fitness) Highly fit strings have a higher number of offspring in the succeeding generation
89
A Simple Genetic Algorithm - Crossover
Newly reproduced strings in the mating pool are mated at random. Randomly select , l is the string length, then swap all characters between positions k+1 and l.
90
A Simple Genetic Algorithm - Crossover
Example string 1: 01101 string 2: 11000 select k from [1, 5-1] = [1, 4] if k = 4, then 01101 11000 01100 11001 (New strings)
91
A Simple Genetic Algorithm - Crossover
92
A Simple Genetic Algorithm - Crossover
initialization reproduction crossover Random number generation String copy String exchange Mix of direction and chance builds new solutions from the best partial solutions
93
A Simple Genetic Algorithm - Mutation
Reproduction and crossover may occasionally become overzealous may loss some potentially useful genetic material Mutation occasional random alternation of the value of a string position for binary coding: 0 1 a random walk through the string space an insurance policy against premature loss with small probability
94
A Simulation by Hand Maximize Objective function
95
A Simulation by Hand Coding [0,31] 00000~11111 Initialization
96
A Simulation by Hand Reproduction the weighted roulette wheel
97
A Simulation by Hand Crossover
98
A Simulation by Hand Mutation performed on a bit-by-bit basis
probability: per bit
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.