Download presentation

Presentation is loading. Please wait.

1
CES 514 – Data Mining Lecture 8 classification (contd…)

2
Example: PEBLS l PEBLS: Parallel Examplar-Based Learning System (Cost & Salzberg) –Works with both continuous and nominal features For nominal features, distance between two nominal values is computed using modified value difference metric (MVDM) –Each record is assigned a weight factor –Number of nearest neighbor, k = 1

3
Example: PEBLS Class Marital Status Single MarriedDivorced Yes201 No241 Distance between nominal attribute values: d(Single,Married) = | 2/4 – 0/4 | + | 2/4 – 4/4 | = 1 d(Single,Divorced) = | 2/4 – 1/2 | + | 2/4 – 1/2 | = 0 d(Married,Divorced) = | 0/4 – 1/2 | + | 4/4 – 1/2 | = 1 d(Refund=Yes,Refund=No) = | 0/3 – 3/7 | + | 3/3 – 4/7 | = 6/7 Class Refund YesNo Yes03 No34

4
Example: PEBLS Distance between record X and record Y: where: w X 1 if X makes accurate prediction most of the time w X > 1 if X is not reliable for making predictions

5
Support Vector Machines l Find a linear hyperplane (decision boundary) that will separate the data

6
Support Vector Machines l One Possible Solution

7
Support Vector Machines l Another possible solution

8
Support Vector Machines l Other possible solutions

9
Support Vector Machines l Which one is better? B1 or B2? l How do you define better?

10
Support Vector Machines l Find hyperplane maximizes the margin (e.g. B1 is better than B2.)

11
Support Vector Machines

12
l We want to maximize: –Which is equivalent to minimizing: –But subjected to the following constraints: This is a constrained optimization problem –Numerical approaches to solve it (e.g., quadratic programming)

13
Overview of optimization Simplest optimization problem: Maximize f(x) (one variable) If the function has nice properties (such as differentiable), then we can use calculus to solve the problem. solve equation f’(x) = 0. Suppose a root is a. Then if f’’(a) < 0 then a is a maximum. Tricky issues: How to solve the equation f’(x) = 0? what if there are many solutions? Each is a “local” optimum.

14
How to solve g(x) = 0 Even polynomial equations are very hard to solve. Quadratic has a closed-form. What about higher- degrees? Numerical techniques: (iteration) bisection secant Newton-Raphson etc. Challenges: initial guess rate of convergence?

15
Functions of several variables Consider equation such as F(x,y) = 0 To find the maximum of F(x,y), we solve the equations and If we can solve this system of equations, then we have found a local maximum or minimum of F. We can solve the equations using numerical techniques similar to the one-dimensional case.

16
When is the solution maximum or minimum? Hessian: if the Hessian is positive definite in the neighborhood of a, then a is a minimum. if the Hessian is negative definite in the neighborhood of a, then a is a maximum. if it is neither, then a is a saddle point.

17
Application - linear regression Problem: given (x1,y1), … (xn, yn), find the best linear relation between x and y. Assume y = Ax + B. To find A and B, we will minimize Since this is a function of two variables, we can solve by setting and

18
Constrained optimization Maximize f(x,y) subject to g(x,y) = c Using Lagrange multiplier, the problem is formulated as maximizing: h(x,y) = f(x,y) + (g(x,y) – c) Now, solve the equations:

19
Support Vector Machines (contd) l What if the problem is not linearly separable?

20
Support Vector Machines l What if the problem is not linearly separable? –Introduce slack variables Need to minimize: Subject to:

21
Nonlinear Support Vector Machines l What if decision boundary is not linear?

22
Nonlinear Support Vector Machines l Transform data into higher dimensional space

23
Artificial Neural Networks (ANN) Output Y is 1 if at least two of the three inputs are equal to 1.

24
Artificial Neural Networks (ANN)

25
l Model is an assembly of inter-connected nodes and weighted links l Output node sums up each of its input value according to the weights of its links l Compare output node against some threshold t Perceptron Model or

26
General Structure of ANN Training ANN means learning the weights of the neurons

27
Algorithm for learning ANN l Initialize the weights (w 0, w 1, …, w k ) l Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples –Objective function: –Find the weights w i ’s that minimize the above objective function e.g., backpropagation algorithm

28
WEKA

29
WEKA implementations WEKA has implementation of all the major data mining algorithms including: decision trees (CART, C4.5 etc.) naïve Bayes algorithm and all variants nearest neighbor classifier linear classifier Support Vector Machine clustering algorithms boosting algorithms etc.

Similar presentations

© 2020 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google