Download presentation
Presentation is loading. Please wait.
1
CES 514 – Data Mining Lecture 8 classification (contd…)
2
Example: PEBLS l PEBLS: Parallel Examplar-Based Learning System (Cost & Salzberg) –Works with both continuous and nominal features For nominal features, distance between two nominal values is computed using modified value difference metric (MVDM) –Each record is assigned a weight factor –Number of nearest neighbor, k = 1
3
Example: PEBLS Class Marital Status Single MarriedDivorced Yes201 No241 Distance between nominal attribute values: d(Single,Married) = | 2/4 – 0/4 | + | 2/4 – 4/4 | = 1 d(Single,Divorced) = | 2/4 – 1/2 | + | 2/4 – 1/2 | = 0 d(Married,Divorced) = | 0/4 – 1/2 | + | 4/4 – 1/2 | = 1 d(Refund=Yes,Refund=No) = | 0/3 – 3/7 | + | 3/3 – 4/7 | = 6/7 Class Refund YesNo Yes03 No34
4
Example: PEBLS Distance between record X and record Y: where: w X 1 if X makes accurate prediction most of the time w X > 1 if X is not reliable for making predictions
5
Support Vector Machines l Find a linear hyperplane (decision boundary) that will separate the data
6
Support Vector Machines l One Possible Solution
7
Support Vector Machines l Another possible solution
8
Support Vector Machines l Other possible solutions
9
Support Vector Machines l Which one is better? B1 or B2? l How do you define better?
10
Support Vector Machines l Find hyperplane maximizes the margin (e.g. B1 is better than B2.)
11
Support Vector Machines
12
l We want to maximize: –Which is equivalent to minimizing: –But subjected to the following constraints: This is a constrained optimization problem –Numerical approaches to solve it (e.g., quadratic programming)
13
Overview of optimization Simplest optimization problem: Maximize f(x) (one variable) If the function has nice properties (such as differentiable), then we can use calculus to solve the problem. solve equation f’(x) = 0. Suppose a root is a. Then if f’’(a) < 0 then a is a maximum. Tricky issues: How to solve the equation f’(x) = 0? what if there are many solutions? Each is a “local” optimum.
14
How to solve g(x) = 0 Even polynomial equations are very hard to solve. Quadratic has a closed-form. What about higher- degrees? Numerical techniques: (iteration) bisection secant Newton-Raphson etc. Challenges: initial guess rate of convergence?
15
Functions of several variables Consider equation such as F(x,y) = 0 To find the maximum of F(x,y), we solve the equations and If we can solve this system of equations, then we have found a local maximum or minimum of F. We can solve the equations using numerical techniques similar to the one-dimensional case.
16
When is the solution maximum or minimum? Hessian: if the Hessian is positive definite in the neighborhood of a, then a is a minimum. if the Hessian is negative definite in the neighborhood of a, then a is a maximum. if it is neither, then a is a saddle point.
17
Application - linear regression Problem: given (x1,y1), … (xn, yn), find the best linear relation between x and y. Assume y = Ax + B. To find A and B, we will minimize Since this is a function of two variables, we can solve by setting and
18
Constrained optimization Maximize f(x,y) subject to g(x,y) = c Using Lagrange multiplier, the problem is formulated as maximizing: h(x,y) = f(x,y) + (g(x,y) – c) Now, solve the equations:
19
Support Vector Machines (contd) l What if the problem is not linearly separable?
20
Support Vector Machines l What if the problem is not linearly separable? –Introduce slack variables Need to minimize: Subject to:
21
Nonlinear Support Vector Machines l What if decision boundary is not linear?
22
Nonlinear Support Vector Machines l Transform data into higher dimensional space
23
Artificial Neural Networks (ANN) Output Y is 1 if at least two of the three inputs are equal to 1.
24
Artificial Neural Networks (ANN)
25
l Model is an assembly of inter-connected nodes and weighted links l Output node sums up each of its input value according to the weights of its links l Compare output node against some threshold t Perceptron Model or
26
General Structure of ANN Training ANN means learning the weights of the neurons
27
Algorithm for learning ANN l Initialize the weights (w 0, w 1, …, w k ) l Adjust the weights in such a way that the output of ANN is consistent with class labels of training examples –Objective function: –Find the weights w i ’s that minimize the above objective function e.g., backpropagation algorithm
28
WEKA
29
WEKA implementations WEKA has implementation of all the major data mining algorithms including: decision trees (CART, C4.5 etc.) naïve Bayes algorithm and all variants nearest neighbor classifier linear classifier Support Vector Machine clustering algorithms boosting algorithms etc.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.