Predictive Analytics – Basics of Machine Learning

Predictive Analytics – Basics of Machine Learning

What Is Machine Learning?
Automating automation Getting computers to program themselves Writing software is the bottleneck Let the data do the work instead!

Traditional Programming
Machine Learning Computer Data Output Program Computer Data Program Output

Sample Applications Web search Computational biology Finance
E-commerce Space exploration Robotics Information extraction Social networks Debugging [Your favorite area]

ML in a Nutshell Tens of thousands of machine learning algorithms
Hundreds new every year Every machine learning algorithm has three components: Representation Evaluation Optimization

Representation Decision trees Sets of rules / Logic programs Instances
Graphical models (Bayes/Markov nets) Neural networks Support vector machines Model ensembles Etc.

Evaluation Accuracy Precision and recall Squared error Likelihood
Posterior probability Cost / Utility Margin Entropy K-L divergence Etc.

Optimization Combinatorial optimization Convex optimization
E.g.: Greedy search Convex optimization E.g.: Gradient descent Constrained optimization E.g.: Linear programming

Types of Learning Supervised (inductive) learning
Training data includes desired outputs Unsupervised learning Training data does not include desired outputs Semi-supervised learning Training data includes a few desired outputs Reinforcement learning Rewards from sequence of actions

ML in Practice Understanding domain, prior knowledge, and goals
Data integration, selection, cleaning, pre-processing, etc. Learning models Interpreting results Consolidating and deploying discovered knowledge Loop

Taxonomy of Machine Learning
Rote Learning (背誦式學習) Learning from Instruction (教導式學習) Learning by Analogy (類推式學習) Learning from Examples (例舉式學習) Supervised Learning (the most common type) Classification Regression Reinforcement Learning Unsupervised Learning Evolutionary Learning (learning with probability) Markov Chain (learning with probability) Hidden Markov model (learning with probability and time)

Rote Learning (背誦式學習) 最簡單的一種形式不對輸入資料做任何處理問題的解答被直接儲存起來，等到同樣的問題出現時再被取出
如果儲存太多例子，在其中尋找對應於新問題的解，將使程式的效率整個地降低下來可使用索引(Indexing)技巧來簡化搜尋資料的動作也可以用啟示的方法來限制儲存資料的數目 1950年代末期Samuel使用此型式來儲存西洋棋盤面狀況的CHECKER程式中已相當成熟

Learning from Instruction (教導式學習)
以外界教導者的意見來改進效能。類似於大部份的正規教育方法。最大難題將人類的高階語言，翻譯成程式可用的內在知識結構將此知識與現有的知識基底整合在一起，以做有效的應用學習者需執行一些演繹的工作，不過大部份的責任在教導者這一方。教導者需提供組織好的知識，使學習者的現有知識漸次增加。需建立一可接受指示或意見，且能儲存和有效應用所得知識的系統。

Learning by Analogy (類推式學習)
基於兩者的類似之處，將現有的知識應用到新的問題上。更動現有的知識以適應新的例子。例子我們可能告訴學生，網球的開球動作就像將釘子釘入牆上高處的某一點一樣。同樣的，一個類推式學習的系統，很可能被用來將一個既成的電腦程式，轉換成一個執行相關功能的程式，而後者並非原來就具備這些功能。比起背誦式學習或教導式學習，類推式學習要求學習者具有更多的推論能力。

Learning from Examples
Given an example set of limited size, find a concise data description Supervised Learning (the most common type) Classification Regression Reinforcement Learning Unsupervised Learning Evolutionary Learning

Supervised Learning Classification Regression
To assign new inputs to one of a number of discrete classes or categories Ex: Assign a character bitmap the correct letter of the alphabet Other pattern recognition problems Regression The output space is formed by the values of continuous variables Predict the value of shares in the stock exchange market

Reinforcement Learning
How to map situations to actions, in order to maximize a given reward. The learning algorithm is not told which actions to take in a given situation The learner is assumed to gain information about actions taken by some reward not necessarily arriving immediately after the action is taken Ex: play chess Reward Winning losing

Unsupervised Learning
If the data is only a sample of objects without associated target values Clustering algorithm Given a training sample of objects with the aim of extracting some structure from them Ex: Identify indoor or outdoor images Dimensionality reduction method Represent high-dimensionality data in low-dimension space, trying to preserve the original information of data

Evolutionary Learning
Biological organisms adapt to improve their survival rates and chance of having offspring in their environment

Least Squares Learning Rule

A Beginning - Artificial Neuron
樹狀突起由其它的神經單元接收訊號。當其所接受的脈動(Impulse) 超過某一特定的定限(Threshold)，這個神經單元就會被點燃(Fire)，並產生一個脈動傳遞到軸突。軸突末端的分支稱為胞突纏絡(突觸) (Synapse)，它是神經與神經的連絡點﹔它可以是抑制的或者是刺激的。抑制的胞狀纏絡會降低所傳送的脈動﹔刺激的細胞纏絡則會加強之。 Cel body:細胞主體 Dendrite:樹狀突起 Axon:軸突

Artificial neuron (perceptron)
Artificial neuron (two inputs case) w0 x0=1 Output function Activation function

Artificial Neuron Output function (Activation function)
Binary threshold function

Linear threshold function

Sigmoidal function

Gaussian function

Least squares learning rule - regression
Given Minimize the difference (mean square error) between the desired output and the actual output for each input vector

Let then

Let then 4-31

For real time learning

The modification depends on output error and inputs

Back-propagation Learning Rule

Back-propagation learning rule
Improve the learning efficiency w0 wn 1 … xn Bias term … xn

linear output function: Output layer Hidden layer

Sigmoid output function: Output layer Hidden layer

Output layer (Linear vs Sigmoid) Hidden layer (Linear vs Sigmoid)

Apply the input vector to the input layer Calculate net-input of the input layer: Calculate the output of the output layer: Calculate the net-input of the output layer: Update the weights of the output layer: For linear output unit For sigmoid function : the learning-rate parameter (0<<1)

Update the weights of the hidden layer: For linear output unit For sigmoid function If the error term is acceptably small, stop the training.

Weights Be initialized to small Random values between 0.5. bias terms May improve the learning learning rate parameter  A small number On the order of 0.05 to 0.25 To ensure settling to a solution.

Back-propagation learning rule – An Exercise
Exclusive-OR operation

Initial weights and threshold levels are set randomly: w13 = 0.5, w14 = 0.9, w23 = 0.4, w24 = 1.0, w35 = -1.2, w45 = 1.1, q3 = 0.8, q4 = -0.1 and q5 = 0.3. y3 y4

The training process is repeated until the sum of squared errors is less than

Sum-Squared Error

Final results

Competitive Learning

Competitive Learning Principle of Competitive Learning

Competitive Learning Discovery of significant patterns or invariants of the environment without the intervention(介入) of a teaching input self-reinforcing based on competition involve cooperation

Competitive Learning Example of CL
Three clusters of vectors (denoted by solid dots) distributed on the unit sphere. Initially randomized codebook vectors (crosses) move under influence of a competitive learning rule to approximate the centroids of the clusters. Competitive learning schemes use codebook vectors to approximate centroids of data clusters.

Competitive Learning Hebbian learning rule
signal Hebbian learning rule Competitive learning rule Linear competitive learning rule

Competitive Learning - Kohonen learning rule
Winner-take-all learning rule

About winner The more the weights of a node parallel to the input vector, the higher probability he wins

Example: a two-node Kohonen network training patterns training patterns in polar coordinate form normalized initial weights

Self-Organizing Feature Map

Self-Organizing Feature Map: Underlying Ideas
Unsupervised learning process. Clusters of neurons win the competition. Weights of winning neurons are adjusted to bring about a better response to the current input. Final weights specify clusters of network nodes that are topologically close. Correspondence between signal features and response locations on the map. Preserves the topology of the input.

Self-Organizing Feature Map: Underlying Ideas
Distance relations in high dimensional spaces should be approximated by the network as the distances in the two dimensional neuronal field: Input neurons should be exposed to a sufficient number of different inputs. Only the winning neuron and its neighbours adapt their connections. A similar weight update procedure is employed on neurons which comprise topologically related subsets. The resulting adjustment enhances the responses to the same or to a similar input that occurs subsequently

Neighbourhood Computation Neighbourhood is a function of time: as epochs of training elapse, the neighbourhood shrinks. Neighbourhood Shapes

Some Observations Ordering phase (initial period of adaptation): learning rate should be close to unity. Learning rate should be decreased linearly, exponentially or inversely with iteration over the first 1000 epochs while maintaining its value above 0.1. Convergence phase: learning rate should be maintained at around 0.01 for a large number of epochs. may typically run into many tens of thousands of epochs. During the ordering phase NkIJ shrinks linearly with k to finally include only a few neurons. During the convergence phase NkIJ may comprise only one or no neighbours.

Self-Organizing Feature Map - Simulation Example
The data employed in the experiment comprised 500 points distributed uniformly over the bipolar square [−1, 1] × [−1, 1]. The points thus describe a geometrically square topology. 88 planar array of neurons was applied.

Simulation Notes: Initial value of the neighbourhood radius r = 6.  Neighbourhood is initially a square of width 12 centered around the winning neuron IJ Neighbourhood width contracts by 1 every 200 epochs. After 1000 epochs, neighbourhood radius maintained at 1. Neighbourhood radius can also be zero  only the winning neuron updates its weights.

Evolutionary Learning

Calculus-based Search - Indirect method
Set the gradient of the objective function equal to 0 fig. 1.1

Calculus-based Search - Direct method
Move in the direction related to the local gradient  hill-climbing

Calculus-based Search
Both methods are local in scope  best in a neighborhood of the current point fig. 1.2

Calculus-based Search
Both dependent upon the existence of derivative

Calculus-based search
Real world search Calculus-based methods are insufficient

State Space Search - Breadth first search
First in First out

State Space Search - Breadth first search

State Space Search - Depth first search
Last in First out

State Space Search - Depth first search

State Space Search - Best first search
(First in Best out)

State Space Search - Best first search

The Goals of Optimization
Process improvement shortest path Optimum (destination) itself goal convergence

Random search Search and save the best Not randomized technique
Genetic Algorithms (GAs) use random choice as a tool in a directed search process

Characteristics of GAs - search information
Which is the best? Smallest, longest, fast, fittest, ... The best in the array (met elements) is always not the solution. plays a role to guide the search direction

The search direction breadth first search the first element in the array depth first search the last element in the array best first search the best element in the array winner takes all neural network search the gradient calculation, gradient of the error function By GAs search, how to find the search direction?

winner takes all winner group takes all Randomly exchange the bits of the strings Winner group The next search state

Characteristics of GAs
Work with a coding of the parameter set, not the parameters themselves Search from a population of points, not a single point Use objective function information, not derivatives or other auxiliary knowledge Use probabilistic transition rules, not deterministic rules

Characteristics of GAs - coding
Example: find the maximum of f(x) fig. 1.5 Code the parameter x as a finite length string How to code the parameter?

A Simple Genetic Algorithm - Initialization
Example 01101 11000 01000 10011 . String number dependents on the complexity of the problem

A Simple Genetic Algorithm - Reproduction
Coping strings according to their fitness value. String with higher value have a higher probability of offspring in the next generation. Example Table 1.1

A Simple Genetic Algorithm - Reproduction
fig. 1.7 Each string has a reproduction probability proportional to their corresponding area (fitness) Highly fit strings have a higher number of offspring in the succeeding generation

A Simple Genetic Algorithm - Crossover
Newly reproduced strings in the mating pool are mated at random. Randomly select , l is the string length, then swap all characters between positions k+1 and l.

Example string 1: 01101 string 2: 11000 select k from [1, 5-1] = [1, 4] if k = 4, then 01101 11000 01100 11001 (New strings)

initialization reproduction crossover Random number generation String copy String exchange Mix of direction and chance builds new solutions from the best partial solutions

A Simple Genetic Algorithm - Mutation
Reproduction and crossover may occasionally become overzealous may loss some potentially useful genetic material Mutation occasional random alternation of the value of a string position for binary coding: 0  1 a random walk through the string space an insurance policy against premature loss with small probability

A Simulation by Hand Maximize Objective function

A Simulation by Hand Coding [0,31]  00000~11111 Initialization

A Simulation by Hand Reproduction the weighted roulette wheel

A Simulation by Hand Crossover

A Simulation by Hand Mutation performed on a bit-by-bit basis
probability: per bit

Predictive Analytics – Basics of Machine Learning

Similar presentations

Presentation on theme: "Predictive Analytics – Basics of Machine Learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Predictive Analytics – Basics of Machine Learning

Similar presentations

Presentation on theme: "Predictive Analytics – Basics of Machine Learning"— Presentation transcript:

Similar presentations

About project

Feedback