Classification and Prediction

Classification and Prediction
Fuzzy

Fuzzy Set Approaches Fuzzy logic uses truth values between 0.0 and 1.0 to represent the degree of membership (such as using fuzzy membership graph) Attribute values are converted to fuzzy values e.g., income is mapped into the discrete categories {low, medium, high} with fuzzy values calculated For a given new sample, more than one fuzzy value may apply Each applicable rule contributes a vote for membership in the categories Typically, the truth values for each predicted category are summed

Fuzzy Sets Sets with fuzzy boundaries A = Set of tall people
Heights 5’10’’ 1.0 Crisp set A Fuzzy set A 1.0 .9 Membership function A fuzzy set is a set with fuzzy boundary. Suppose that A is the set of tall people. In a conventional set, or crisp set, an element is either belong to not belong to a set; there nothing in between. Therefore to define a crisp set A, we need to find a number, say, 5??, such that for a person taller than this number, he or she is in the set of tall people. For a fuzzy version of set A, we allow the degree of belonging to vary between 0 and 1. Therefore for a person with height 5??, we can say that he or she is tall to the degree of 0.5. And for a 6-foot-high person, he or she is tall to the degree of .9. So everything is a matter of degree in fuzzy sets. If we plot the degree of belonging w.r.t. heights, the curve is called a membership function. Because of its smooth transition, a fuzzy set is a better representation of our mental model of all? Moreover, if a fuzzy set has a step-function-like membership function, it reduces to the common crisp set. .5 5’10’’ 6’2’’ Heights 2019/4/25

Membership Functions (MFs)
Characteristics of MFs: Subjective measures Not probability functions “tall” in Asia MFs “tall” in NBA Here I like to emphasize some important properties of membership functions. First of all, it subjective measure; my membership function of all?is likely to be different from yours. Also it context sensitive. For example, I 5?1? and I considered pretty tall in Taiwan. But in the States, I only considered medium build, so may be only tall to the degree of .5. But if I an NBA player, Il be considered pretty short, cannot even do a slam dunk! So as you can see here, we have three different MFs for all?in different contexts. Although they are different, they do share some common characteristics --- for one thing, they are all monotonically increasing from 0 to 1. Because the membership function represents a subjective measure, it not probability function at all. .8 “tall” in the US .5 .1 5’10’’ Heights 2019/4/25

A fuzzy set is totally characterized by a membership function (MF).
Fuzzy Sets Formal definition: A fuzzy set A in X is expressed as a set of ordered pairs: Membership function (MF) Universe or universe of discourse Fuzzy set A fuzzy set is totally characterized by a membership function (MF). 2019/4/25

Fuzzy Sets with Discrete Universes
Fuzzy set A = “sensible number of children” X = {0, 1, 2, 3, 4, 5, 6} (discrete universe) A = {(0, .1), (1, .3), (2, .7), (3, 1), (4, .6), (5, .2), (6, .1)} 2019/4/25

Fuzzy Sets with Cont. Universes
Fuzzy set B = “about 50 years old” X = Set of positive real numbers (continuous) B = {(x, mB(x)) | x in X} 2019/4/25

Fuzzy Partition Fuzzy partitions formed by the linguistic values “young”, “middle aged”, and “old”: 2019/4/25 lingmf.m

Set-Theoretic Operations
Subset: Complement: Union: Intersection: 2019/4/25

Set-Theoretic Operations
subset.m 2019/4/25 fuzsetop.m

MF Formulation disp_mf.m 2019/4/25

Fuzzy If-Then Rules General format: Examples: If x is A then y is B
If pressure is high, then volume is small. If the road is slippery, then driving is dangerous. If a tomato is red, then it is ripe. If the speed is high, then apply the brake a little. 2019/4/25

Fuzzy Support Vector Machine

Support Vector Machine
To search the Optimal Separating Hyperplane to maximize the margin

To train SVM is equal to solving a quadratic programming problem Test phase si : support vectors, yi : class of si K(): kernel function, αi b : parameters

Kernel Function K(x,y) = (x) • (y) x,y are vectors in input space (x), (y) are vectors in feature space d (feature space) >> d (input space) No need to compute (x) explicitly Tr(x,y) = sub(x) • sub(y), where sub(x) is a vector represents all the sub-trees of x.

Fuzzy Support Vector Machine Prediction

What Is Prediction? Prediction is similar to classification
First, construct a model Second, use model to predict unknown value Major method for prediction is regression Linear and multiple regression Non-linear regression Prediction is different from classification Classification refers to predict categorical class label Prediction models continuous-valued functions

Regress Analysis and Log-Linear Models in Prediction
Linear regression: Y =  +  X Two parameters ,  and  specify the line and are to be estimated by using the data at hand. using the least squares criterion to the known values of Y1, Y2, …, X1, X2, …. Multiple regression: Y = b0 + b1 X1 + b2 X2. Many nonlinear functions can be transformed into the above. Log-linear models: The multi-way table of joint probabilities is approximated by a product of lower-order tables. Probability: p(a, b, c, d) = ab acad bcd

Locally Weighted Regression
Construct an explicit approximation to f over a local region surrounding query instance xq. Locally weighted linear regression: The target function f is approximated near xq using the linear function: minimize the squared error: distance-decreasing weight K the gradient descent training rule: In most cases, the target function is approximated by a constant, linear, or quadratic function.

Fuzzy Support Vector Machine Prediction Classification accuracy

Classification Accuracy: Estimating Error Rates
Partition: Training-and-testing use two independent data sets, e.g., training set (2/3), test set(1/3) used for data set with large number of samples Cross-validation divide the data set into k subsamples use k-1 subsamples as training data and one sub-sample as test data --- k-fold cross-validation for data set with moderate size Bootstrapping (leave-one-out) for small size data

Boosting and Bagging Boosting increases classification accuracy
Applicable to decision trees or Bayesian classifier Learn a series of classifiers, where each classifier in the series pays more attention to the examples misclassified by its predecessor Boosting requires only linear time and constant space

Boosting Technique (II) — Algorithm
Assign every example an equal weight 1/N For t = 1, 2, …, T Do Obtain a hypothesis (classifier) h(t) under w(t) Calculate the error of h(t) and re-weight the examples based on the error Normalize w(t+1) to sum to 1 Output a weighted sum of all the hypothesis, with each hypothesis weighted according to its accuracy on the training set

Is Accuracy Enough to Judge?
Sensitivity: t_pos/pos Specificity: t_neg/neg Precision: t_pos/(t_pos+f_pos)

Decision tree Bayesian Classification ANN KNN GA Fuzzy SVM Prediction Some issues

Summary Classification is an extensively studied problem (mainly in statistics, machine learning & neural networks) Classification is probably one of the most widely used data mining techniques with a lot of extensions Scalability is still an important issue for database applications: thus combining classification with database techniques should be a promising topic Research directions: classification of non-relational data, e.g., text, spatial, multimedia, etc..

Classification and Prediction

Similar presentations

Presentation on theme: "Classification and Prediction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Classification and Prediction

Similar presentations

Presentation on theme: "Classification and Prediction"— Presentation transcript:

Similar presentations

About project

Feedback