Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111.

Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111

Content Motivation Problem Methodology Simulation Results Conclusion 11/12/2011INFORMS DM-HI2

Motivation Is Cancer Misdiagnosis More Common Than You Thought? It is estimated that nearly 12 percent of all cancer diagnoses may be in error. When a positive cancer diagnosis is missed, the consequences can be deadly. For example, a woman who is diagnosed with breast cancer in its early stages will survive at least 5 years longer. Being misdiagnosed with cancer can be a devastating. Patients who are misdiagnosed are often subjected to unnecessary, harmful, painful and expensive treatments. Confirm a diagnosis via methods such as seeking second opinions, consulting specialists, getting further medical tests, and researching information about the medical condition. INFORMS DM-HI11/12/20113

Motivation Is Cancer Misdiagnosis More Common Than You Thought? It is estimated that nearly 12 percent of all cancer diagnoses may be in error. When a positive cancer diagnosis is missed, the consequences can be deadly. For example, a woman who is diagnosed with breast cancer in its early stages will survive at least 5 years longer. Being misdiagnosed with cancer can be a devastating. Patients who are misdiagnosed are often subjected to unnecessary, harmful, painful and expensive treatments. Confirm a diagnosis via methods such as seeking second opinions, consulting specialists, getting further medical tests, and researching information about the medical condition. INFORMS DM-HI11/12/20114 When can we trust a diagnosis? When do we need to have additional tests? When can we trust a diagnosis? When do we need to have additional tests?

Problem What we do: Given data, is it a benign cancer or malignant? What we need to do: Is the given data enough to decide on the type of cancer? – YES : What’s the type of cancer? – NO : Do more Tests 11/12/2011INFORMS DM-HI5

Problem/Solution What we do: Given data, is it a benign cancer or malignant? What we need to do: Is the given data enough to decide on the type of cancer? – YES : What’s the type of cancer? – NO : Do more Tests Find the Criteria that most of errors occur Find the probability of error (P e ) If P e > α, wait for more tests 11/12/2011INFORMS DM-HI6

PFSVM Methodology Probabilistic Fuzzy Support Vector Machine (PFSVM) is a two-phase classification method which probabilistically assigns the points to each of the classes. 1- Apply FSVM to the whole training data such that most of the uncertain points will be placed in the margin. Moreover, the certain points are assigned to appropriate classes. 2- Define a fuzzy membership function and an appropriate rule to classify the points that were located in the margin. This will result in assigning uncertain points to each of the classes with a specific probability. INFORMS DM-HI11/12/20117

SVM 11/12/2011INFORMS DM-HI8 X T β+ β 0 = 0 X T β+ β 0 < 0 X T β+ β 0 > 0 Suppose Training Data – N pairs (X 1,Y 1 ),…,(X n,Y n ) where Y i ∈ {-1,1} Separable Data: Separating Hyperplane {X: f(X)= X T β+ β 0 =0} separates data Classification Rule: g(x) = sign(X T β+ β 0 )

SVM 11/12/2011INFORMS DM-HI9 ξiξi ξiξi ξiξi ξiξi ξiξi ξiξi Non-Separable Data: SVM maximizes the margin M between the training points for class 1 and -1, but allows for some points to be on the wrong side of the margin

FSVM In many real-world applications, the effects of the training points are different, i.e. some training points are more important than others. Each training point does not exactly belong to one of the two classes. It may 90% belong to one class and 10% of the other class. There is a fuzzy membership 0 < s i ≤ 1 associated with each training point X i. 11/12/2011INFORMS DM-HI10

FSVM Suppose out of N training points, N 1 points are in class 1 and N 2 remaining points are in class 2. Define the weight for each point as following: where μ jk and σ jk refer to the mean and standard deviation of j th feature of all points in the class k, respectively. Moreover, x ij indicates the j th feature value of i th point. Normalize the weights such that the total sum of the weights is equal to N, which is the sum of error costs for the classic SVM. the weights show up in the objective function 11/12/2011INFORMS DM-HI11

FSVM Suppose out of N training points, N 1 points are in class 1 and N 2 remaining points are in class 2. Define the weight for each point as following: where μ jk and σ jk refer to the mean and standard deviation of j th feature of all points in the class k, respectively. Moreover, x ij indicates the j th feature value of i th point. Normalize the weights such that the total sum of the weights is equal to N, which is the sum of error costs for the classic SVM. the weights show up in the objective function 11/12/2011INFORMS DM-HI12 Points near to the center of each class have a higher weight than those farther. Therefore, near points will be classified certainly, and the points which are in the middle of the two classes, called uncertain points, will be located in the margin.

PFSVM Methodology Probabilistic Fuzzy Support Vector Machine (PFSVM) is a Two-phase classification method which probabilistically assigns the uncertain points to each of the classes. 1- Apply FSVM to the whole training data such that most of the uncertain points will be placed in the margin. Moreover, the certain points are assigned to appropriate classes. 2- Define a fuzzy membership function and an appropriate rule to classify the points that were located in the margin. This will result in assigning uncertain points to each of the classes with a specific probability. INFORMS DM-HI11/12/201113

Fuzzy Classification Apply a fuzzy classification on the marginal points Define Gaussian fuzzy membership function A ik for every test point Y i located in the margin as where μ jk and σ jk are the mean and standard deviation of training points of class k located in the margin, respectively. This membership shows the closeness of element Y i to the center of K th class. To measure the related closeness of a point to both centers, a “membership probability” is defined for each marginal point as follows: 11/12/2011INFORMS DM-HI14

Fuzzy Classification Apply a fuzzy classification on the marginal points Define Gaussian fuzzy membership function A ik for every test point Y i located in the margin as where μ jk and σ jk are the mean and standard deviation of training points of class k located in the margin, respectively. This membership shows the closeness of element Y i to the center of K th class. To measure the related closeness of a point to both centers, a “membership probability” is defined for each marginal point as follows: 11/12/2011INFORMS DM-HI15 Points with probability more than 90% in class, will be assigned to that class. Otherwise, the given information is not sufficient to make a decision.

DATA SET Wisconsin breast cancer diagnostic dataset 569 instances in two classes of Malignant (M) and Benign (B) with 32 features per instance. Reduce the number of features from 32 to 23 by saving just one feature out of every set of features with correlation more than 0.95. Determine the set of training and test data by 10-fold cross validation method. INFORMS DM-HI11/12/201116

Widen the Margin by FSVM 11/12/2011INFORMS DM-HI17 SVM - Width of Margin: 0.895 FSVM - Width of Margin: 1.931 0 100 -100 -200 -300 -400 -500 -600 -700 -800 -35-30-25-20-15-10-50

Error Location in FSVM Methods 11/12/2011INFORMS DM-HI18 On average, more than 80% of errors are inside the margin

Comparison of different classification methods Method\Run12345678910Percent ave SVM err 11534124013.86 FSVM err 44577321347.02 Fuzzy err 33584653317.19 PFSVM err 1100300120 PFSVM undet 11120121001.58 11/12/2011INFORMS DM-HI19

Comparison of different classification methods Method\Run12345678910Percent ave SVM err 11534124013.86 FSVM err 44577321347.02 Fuzzy err 33584653317.19 PFSVM err 1+1100+23+10+2012+10+21.63 PFSVM undet 11120121001.58 11/12/2011INFORMS DM-HI20

Double Cost PFSVM 1)Misdiagnosis of positive cancer is deadly 2)Most of errors happen in positive cancer diagnosis 11/12/2011INFORMS DM-HI21 Double the cost of error for Positive Cancer Diagnosis On average, more than 98% of errors are inside the margin

Comparison of different classification methods Method\Run12345678910Percent ave SVM err 42421231234.29 FSVM err 62321241455.36 Fuzzy err 72541363446.96 PFSVM err 0210020001 PFSVM undet 10310111312.14 11/12/2011INFORMS DM-HI22

Comparison of different classification methods Method\Run12345678910Percent ave SVM err 42421231234.29 FSVM err 62321241455.36 Fuzzy err 72541363446.96 PFSVM err 021002000+111.23 PFSVM undet 10310111312.14 11/12/2011INFORMS DM-HI23

Comparison of different classification methods Method\Run12345678910Percent ave SVM err 42421231234.29 FSVM err 62321241455.36 Fuzzy err 72541363446.96 PFSVM err 021002000+111.23 PFSVM undet 10310111312.14 11/12/2011INFORMS DM-HI24 QUESTIONS?

Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111.

Similar presentations

Presentation on theme: "Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111.

Similar presentations

Presentation on theme: "Probabilistic Classification using Fuzzy Support Vector Machines (PFSVM) Marzieh Parandehgheibi ORC - MIT INFORMS DM-HI11/12/20111."— Presentation transcript:

Similar presentations

About project

Feedback