 # CES 514 – Data Mining Lecture 8 classification (contd…)

## Presentation on theme: "CES 514 – Data Mining Lecture 8 classification (contd…)"— Presentation transcript:

CES 514 – Data Mining Lecture 8 classification (contd…)

Example: PEBLS l PEBLS: Parallel Examplar-Based Learning System (Cost & Salzberg) –Works with both continuous and nominal features  For nominal features, distance between two nominal values is computed using modified value difference metric (MVDM) –Each record is assigned a weight factor –Number of nearest neighbor, k = 1

Example: PEBLS Class Marital Status Single MarriedDivorced Yes201 No241 Distance between nominal attribute values: d(Single,Married) = | 2/4 – 0/4 | + | 2/4 – 4/4 | = 1 d(Single,Divorced) = | 2/4 – 1/2 | + | 2/4 – 1/2 | = 0 d(Married,Divorced) = | 0/4 – 1/2 | + | 4/4 – 1/2 | = 1 d(Refund=Yes,Refund=No) = | 0/3 – 3/7 | + | 3/3 – 4/7 | = 6/7 Class Refund YesNo Yes03 No34

Example: PEBLS Distance between record X and record Y: where: w X  1 if X makes accurate prediction most of the time w X > 1 if X is not reliable for making predictions

Support Vector Machines l Find a linear hyperplane (decision boundary) that will separate the data

Support Vector Machines l One Possible Solution

Support Vector Machines l Another possible solution

Support Vector Machines l Other possible solutions

Support Vector Machines l Which one is better? B1 or B2? l How do you define better?

Support Vector Machines l Find hyperplane maximizes the margin (e.g. B1 is better than B2.)

Support Vector Machines

l We want to maximize: –Which is equivalent to minimizing: –But subjected to the following constraints:  This is a constrained optimization problem –Numerical approaches to solve it (e.g., quadratic programming)

Overview of optimization Simplest optimization problem: Maximize f(x) (one variable) If the function has nice properties (such as differentiable), then we can use calculus to solve the problem. solve equation f’(x) = 0. Suppose a root is a. Then if f’’(a) < 0 then a is a maximum. Tricky issues: How to solve the equation f’(x) = 0? what if there are many solutions? Each is a “local” optimum.

How to solve g(x) = 0 Even polynomial equations are very hard to solve. Quadratic has a closed-form. What about higher- degrees? Numerical techniques: (iteration) bisection secant Newton-Raphson etc. Challenges: initial guess rate of convergence?

Functions of several variables Consider equation such as F(x,y) = 0 To find the maximum of F(x,y), we solve the equations and If we can solve this system of equations, then we have found a local maximum or minimum of F. We can solve the equations using numerical techniques similar to the one-dimensional case.

When is the solution maximum or minimum? Hessian: if the Hessian is positive definite in the neighborhood of a, then a is a minimum. if the Hessian is negative definite in the neighborhood of a, then a is a maximum. if it is neither, then a is a saddle point.

Application - linear regression Problem: given (x1,y1), … (xn, yn), find the best linear relation between x and y. Assume y = Ax + B. To find A and B, we will minimize Since this is a function of two variables, we can solve by setting and

Constrained optimization Maximize f(x,y) subject to g(x,y) = c Using Lagrange multiplier, the problem is formulated as maximizing: h(x,y) = f(x,y) + (g(x,y) – c) Now, solve the equations:

Support Vector Machines (contd) l What if the problem is not linearly separable?

Support Vector Machines l What if the problem is not linearly separable? –Introduce slack variables  Need to minimize:  Subject to:

Nonlinear Support Vector Machines l What if decision boundary is not linear?

Nonlinear Support Vector Machines l Transform data into higher dimensional space

Artificial Neural Networks (ANN) Output Y is 1 if at least two of the three inputs are equal to 1.

Artificial Neural Networks (ANN)

l Model is an assembly of inter-connected nodes and weighted links l Output node sums up each of its input value according to the weights of its links l Compare output node against some threshold t Perceptron Model or

General Structure of ANN Training ANN means learning the weights of the neurons

Algorithm for learning ANN l Initialize the weights (w 0, w 1, …, w k ) l Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples –Objective function: –Find the weights w i ’s that minimize the above objective function  e.g., backpropagation algorithm

WEKA

WEKA implementations WEKA has implementation of all the major data mining algorithms including: decision trees (CART, C4.5 etc.) naïve Bayes algorithm and all variants nearest neighbor classifier linear classifier Support Vector Machine clustering algorithms boosting algorithms etc.