Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predictive Analytics – Basics of Machine Learning

Similar presentations


Presentation on theme: "Predictive Analytics – Basics of Machine Learning"— Presentation transcript:

1 Predictive Analytics – Basics of Machine Learning

2 What Is Machine Learning?
Automating automation Getting computers to program themselves Writing software is the bottleneck Let the data do the work instead!

3 Traditional Programming
Machine Learning Computer Data Output Program Computer Data Program Output

4 Sample Applications Web search Computational biology Finance
E-commerce Space exploration Robotics Information extraction Social networks Debugging [Your favorite area]

5 ML in a Nutshell Tens of thousands of machine learning algorithms
Hundreds new every year Every machine learning algorithm has three components: Representation Evaluation Optimization

6 Representation Decision trees Sets of rules / Logic programs Instances
Graphical models (Bayes/Markov nets) Neural networks Support vector machines Model ensembles Etc.

7 Evaluation Accuracy Precision and recall Squared error Likelihood
Posterior probability Cost / Utility Margin Entropy K-L divergence Etc.

8 Optimization Combinatorial optimization Convex optimization
E.g.: Greedy search Convex optimization E.g.: Gradient descent Constrained optimization E.g.: Linear programming

9 Types of Learning Supervised (inductive) learning
Training data includes desired outputs Unsupervised learning Training data does not include desired outputs Semi-supervised learning Training data includes a few desired outputs Reinforcement learning Rewards from sequence of actions

10 ML in Practice Understanding domain, prior knowledge, and goals
Data integration, selection, cleaning, pre-processing, etc. Learning models Interpreting results Consolidating and deploying discovered knowledge Loop

11 Taxonomy of Machine Learning
Rote Learning (背誦式學習) Learning from Instruction (教導式學習) Learning by Analogy (類推式學習) Learning from Examples (例舉式學習) Supervised Learning (the most common type) Classification Regression Reinforcement Learning Unsupervised Learning Evolutionary Learning (learning with probability) Markov Chain (learning with probability) Hidden Markov model (learning with probability and time)

12 Rote Learning (背誦式學習) 最簡單的一種形式 不對輸入資料做任何處理 問題的解答被直接儲存起來,等到同樣的問題出現時再被取出
如果儲存太多例子,在其中尋找對應於新問題的解,將使程式的效率整個地降低下來 可使用索引(Indexing)技巧來簡化搜尋資料的動作 也可以用啟示的方法來限制儲存資料的數目 1950年代末期Samuel使用此型式來儲存西洋棋盤面狀況的CHECKER程式中已相當成熟

13 Learning from Instruction (教導式學習)
以外界教導者的意見來改進效能。 類似於大部份的正規教育方法。 最大難題 將人類的高階語言,翻譯成程式可用的內在知識結構 將此知識與現有的知識基底整合在一起,以做有效的應用 學習者需執行一些演繹的工作,不過大部份的責任在教導者這一方。 教導者需提供組織好的知識,使學習者的現有知識漸次增加。 需建立一可接受指示或意見,且能儲存和有效應用所得知識的系統。

14 Learning by Analogy (類推式學習)
基於兩者的類似之處,將現有的知識應用到新的問題上。 更動現有的知識以適應新的例子。 例子 我們可能告訴學生,網球的開球動作就像將釘子釘入牆上高處的某一點一樣。同樣的,一個類推式學習的系統,很可能被用來將一個既成的電腦程式,轉換成一個執行相關功能的程式,而後者並非原來就具備這些功能。 比起背誦式學習或教導式學習,類推式學習要求學習者具有更多的推論能力。

15 Learning from Examples
Given an example set of limited size, find a concise data description Supervised Learning (the most common type) Classification Regression Reinforcement Learning Unsupervised Learning Evolutionary Learning

16 Supervised Learning Classification Regression
To assign new inputs to one of a number of discrete classes or categories Ex: Assign a character bitmap the correct letter of the alphabet Other pattern recognition problems Regression The output space is formed by the values of continuous variables Predict the value of shares in the stock exchange market

17 Reinforcement Learning
How to map situations to actions, in order to maximize a given reward. The learning algorithm is not told which actions to take in a given situation The learner is assumed to gain information about actions taken by some reward not necessarily arriving immediately after the action is taken Ex: play chess Reward Winning losing

18 Unsupervised Learning
If the data is only a sample of objects without associated target values Clustering algorithm Given a training sample of objects with the aim of extracting some structure from them Ex: Identify indoor or outdoor images Dimensionality reduction method Represent high-dimensionality data in low-dimension space, trying to preserve the original information of data

19 Evolutionary Learning
Biological organisms adapt to improve their survival rates and chance of having offspring in their environment

20 Least Squares Learning Rule

21 A Beginning - Artificial Neuron
樹狀突起由其它的神經單元接收訊號。 當其所接受的脈動(Impulse) 超過某一特定的定限(Threshold),這個神經單元就會被點燃(Fire),並產生一個脈動傳遞到軸突。 軸突末端的分支稱為胞突纏絡(突觸) (Synapse),它是神經與神經的連絡點﹔它可以是抑制的或者是刺激的。抑制的胞狀纏絡會降低所傳送的脈動﹔刺激的細胞纏絡則會加強之。 Cel body:細胞主體 Dendrite:樹狀突起 Axon:軸突

22 Artificial neuron (perceptron)
Artificial neuron (two inputs case) w0 x0=1 Output function Activation function

23 Artificial Neuron Output function (Activation function)
Binary threshold function

24 Artificial Neuron Output function (Activation function)
Linear threshold function

25 Artificial Neuron Output function (Activation function)
Sigmoidal function

26 Artificial Neuron Output function (Activation function)
Gaussian function

27 Least squares learning rule - regression
Given Minimize the difference (mean square error) between the desired output and the actual output for each input vector

28 Least squares learning rule - regression

29 Least squares learning rule - regression
Let then

30 Least squares learning rule - regression

31 Least squares learning rule - regression
Let then 4-31

32 Least squares learning rule - regression
For real time learning

33 Least squares learning rule - regression
The modification depends on output error and inputs

34 Back-propagation Learning Rule

35 Back-propagation learning rule
Improve the learning efficiency w0 wn 1 xn Bias term xn

36 Back-propagation learning rule

37 Back-propagation learning rule
linear output function: Output layer Hidden layer

38 Back-propagation learning rule
Sigmoid output function: Output layer Hidden layer

39 Back-propagation learning rule
Output layer (Linear vs Sigmoid) Hidden layer (Linear vs Sigmoid)

40 Back-propagation learning rule
Apply the input vector to the input layer Calculate net-input of the input layer: Calculate the output of the output layer: Calculate the net-input of the output layer: Update the weights of the output layer: For linear output unit For sigmoid function : the learning-rate parameter (0<<1)

41 Back-propagation learning rule
Update the weights of the hidden layer: For linear output unit For sigmoid function If the error term is acceptably small, stop the training.

42 Back-propagation learning rule
Weights Be initialized to small Random values between 0.5. bias terms May improve the learning learning rate parameter  A small number On the order of 0.05 to 0.25 To ensure settling to a solution.

43 Back-propagation learning rule – An Exercise
Exclusive-OR operation

44 Back-propagation learning rule – An Exercise
Initial weights and threshold levels are set randomly: w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 = -1.2, w45 = 1.1, q3 = 0.8, q4 = -0.1 and q5 = 0.3. y3 y4

45 Back-propagation learning rule – An Exercise

46 Back-propagation learning rule – An Exercise

47 Back-propagation learning rule – An Exercise
The training process is repeated until the sum of squared errors is less than

48 Back-propagation learning rule – An Exercise
Sum-Squared Error

49 Back-propagation learning rule – An Exercise
Final results

50 Competitive Learning

51 Competitive Learning Principle of Competitive Learning

52 Competitive Learning Discovery of significant patterns or invariants of the environment without the intervention(介入) of a teaching input self-reinforcing based on competition involve cooperation

53 Competitive Learning Example of CL
Three clusters of vectors (denoted by solid dots) distributed on the unit sphere. Initially randomized codebook vectors (crosses) move under influence of a competitive learning rule to approximate the centroids of the clusters. Competitive learning schemes use codebook vectors to approximate centroids of data clusters.

54 Competitive Learning Hebbian learning rule
signal Hebbian learning rule Competitive learning rule Linear competitive learning rule

55 Competitive Learning - Kohonen learning rule
Winner-take-all learning rule

56 Competitive Learning - Kohonen learning rule
About winner The more the weights of a node parallel to the input vector, the higher probability he wins

57 Competitive Learning - Kohonen learning rule
Example: a two-node Kohonen network training patterns training patterns in polar coordinate form normalized initial weights

58 Competitive Learning - Kohonen learning rule

59 Self-Organizing Feature Map

60 Self-Organizing Feature Map: Underlying Ideas
Unsupervised learning process. Clusters of neurons win the competition. Weights of winning neurons are adjusted to bring about a better response to the current input. Final weights specify clusters of network nodes that are topologically close. Correspondence between signal features and response locations on the map. Preserves the topology of the input.

61 Self-Organizing Feature Map: Underlying Ideas
Distance relations in high dimensional spaces should be approximated by the network as the distances in the two dimensional neuronal field: Input neurons should be exposed to a sufficient number of different inputs. Only the winning neuron and its neighbours adapt their connections. A similar weight update procedure is employed on neurons which comprise topologically related subsets. The resulting adjustment enhances the responses to the same or to a similar input that occurs subsequently

62 Self-Organizing Feature Map
Neighbourhood Computation Neighbourhood is a function of time: as epochs of training elapse, the neighbourhood shrinks. Neighbourhood Shapes

63 Self-Organizing Feature Map
Some Observations Ordering phase (initial period of adaptation): learning rate should be close to unity. Learning rate should be decreased linearly, exponentially or inversely with iteration over the first 1000 epochs while maintaining its value above 0.1. Convergence phase: learning rate should be maintained at around 0.01 for a large number of epochs. may typically run into many tens of thousands of epochs. During the ordering phase NkIJ shrinks linearly with k to finally include only a few neurons. During the convergence phase NkIJ may comprise only one or no neighbours.

64 Self-Organizing Feature Map - Simulation Example
The data employed in the experiment comprised 500 points distributed uniformly over the bipolar square [−1, 1] × [−1, 1]. The points thus describe a geometrically square topology. 88 planar array of neurons was applied.

65 Self-Organizing Feature Map - Simulation Example

66 Self-Organizing Feature Map - Simulation Example
Simulation Notes: Initial value of the neighbourhood radius r = 6.  Neighbourhood is initially a square of width 12 centered around the winning neuron IJ Neighbourhood width contracts by 1 every 200 epochs. After 1000 epochs, neighbourhood radius maintained at 1. Neighbourhood radius can also be zero  only the winning neuron updates its weights.

67 Evolutionary Learning

68 Calculus-based Search - Indirect method
Set the gradient of the objective function equal to 0 fig. 1.1

69 Calculus-based Search - Direct method
Move in the direction related to the local gradient hill-climbing

70 Calculus-based Search
Both methods are local in scope best in a neighborhood of the current point fig. 1.2

71 Calculus-based Search
Both dependent upon the existence of derivative

72 Calculus-based search
Real world search Calculus-based methods are insufficient

73 State Space Search - Breadth first search
First in First out

74 State Space Search - Breadth first search

75 State Space Search - Depth first search
Last in First out

76 State Space Search - Depth first search

77 State Space Search - Best first search
(First in Best out)

78 State Space Search - Best first search

79 The Goals of Optimization
Process improvement shortest path Optimum (destination) itself goal convergence

80 Random search Search and save the best Not randomized technique
Genetic Algorithms (GAs) use random choice as a tool in a directed search process

81 Characteristics of GAs - search information
Which is the best? Smallest, longest, fast, fittest, ... The best in the array (met elements) is always not the solution. plays a role to guide the search direction

82 Characteristics of GAs - search information
The search direction breadth first search the first element in the array depth first search the last element in the array best first search the best element in the array winner takes all neural network search the gradient calculation, gradient of the error function By GAs search, how to find the search direction?

83 Characteristics of GAs - search information
winner takes all winner group takes all Randomly exchange the bits of the strings Winner group The next search state

84 Characteristics of GAs
Work with a coding of the parameter set, not the parameters themselves Search from a population of points, not a single point Use objective function information, not derivatives or other auxiliary knowledge Use probabilistic transition rules, not deterministic rules

85 Characteristics of GAs - coding
Example: find the maximum of f(x) fig. 1.5 Code the parameter x as a finite length string How to code the parameter?

86 A Simple Genetic Algorithm - Initialization
Example 01101 11000 01000 10011 . String number dependents on the complexity of the problem

87 A Simple Genetic Algorithm - Reproduction
Coping strings according to their fitness value. String with higher value have a higher probability of offspring in the next generation. Example Table 1.1

88 A Simple Genetic Algorithm - Reproduction
fig. 1.7 Each string has a reproduction probability proportional to their corresponding area (fitness) Highly fit strings have a higher number of offspring in the succeeding generation

89 A Simple Genetic Algorithm - Crossover
Newly reproduced strings in the mating pool are mated at random. Randomly select , l is the string length, then swap all characters between positions k+1 and l.

90 A Simple Genetic Algorithm - Crossover
Example string 1: 01101 string 2: 11000 select k from [1, 5-1] = [1, 4] if k = 4, then 01101 11000 01100 11001 (New strings)

91 A Simple Genetic Algorithm - Crossover

92 A Simple Genetic Algorithm - Crossover
initialization reproduction crossover Random number generation String copy String exchange Mix of direction and chance builds new solutions from the best partial solutions

93 A Simple Genetic Algorithm - Mutation
Reproduction and crossover may occasionally become overzealous may loss some potentially useful genetic material Mutation occasional random alternation of the value of a string position for binary coding: 0  1 a random walk through the string space an insurance policy against premature loss with small probability

94 A Simulation by Hand Maximize Objective function

95 A Simulation by Hand Coding [0,31]  00000~11111 Initialization

96 A Simulation by Hand Reproduction the weighted roulette wheel

97 A Simulation by Hand Crossover

98 A Simulation by Hand Mutation performed on a bit-by-bit basis
probability: per bit


Download ppt "Predictive Analytics – Basics of Machine Learning"

Similar presentations


Ads by Google