Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University Part 2 1.

Similar presentations


Presentation on theme: "Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University Part 2 1."— Presentation transcript:

1 Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University Part 2 1

2 Pattern Recognition – DIY using WEKA 2

3 3 The weka (also known as Maori hen or woodhen) (Gallirallus australis) is a flightless bird species of the rail family. It is endemic to New Zealand, where four subspecies are recognized. Weka are sturdy brown birds, about the size of a chicken. As omnivores, they feed mainly on invertebrates and fruit.

4 WEKA “WEKA is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. WEKA contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.” 4

5 5 WEKA And we will be using only the hammer...

6 Data set: OBJECTS FEATURES (attributes, variables, covariates...) 123.N123.N object # n feature # Your data sets are of the WIDE type: small number of objects, large number of features PROBLEM

7 WEKA 7 Prepare the file.arff : 1.Open in an ascii editor 2.Add name NUMERIC... for all class {1,2}... for the class 3.Paste the data underneath

8 8 Feature selection (b) Feature subsets 2 questions How do we select the subsets? How do we evaluate the worth of a subset?

9 9 Feature selection (b) Feature subsets 2 questions How do we select the subsets? How do we evaluate the worth of a subset? Not our problem now Classification accuracy WrapperFilterEmbedded Some easier-to-calculate proxy for the Classification accuracy Decision tree classifier SVM

10 10 Feature selection (b) Feature subsets 2 questions How do we select the subsets? How do we evaluate the worth of a subset? Wrapper FilterEmbedded Ranker Greedy Sequential Forward Selection (SFS) Random Heuristic search Bespoke Genetic Algorithms (GA) Swarm optimisation   

11 11 Feature selection (b) Feature subsets 2 questions How do we select the subsets? How do we evaluate the worth of a subset? Wrapper FilterEmbedded Ranker Greedy Sequential Forward Selection (SFS) Random Heuristic search Bespoke Genetic Algorithms (GA) Swarm optimisation    

12 12 Feature selection methods FCBF (Fast Correlation-Based Filter) - originally proposed for microarray data analysis (Yu and Liu, 2003). The idea of FCBF is that the features that are worth keeping should be correlated with the class variable but not correlated among themselves. CfsSubsetEval 1.L. Yu and H. Liu (2003), Feature selection for high-dimensional data: A fast correlation-based filter solution.

13 13 Feature selection methods Relief-F. Kira and Rendell, 1992; Kononenko et al., For each object in the data set, find the nearest neighbour from the same class (NearHit) and the nearest neighbour from the opposite class (NearMiss) using all features. The relevance score of a feature increases if the feature value in the current object is closer to that in the NearHit compared to that in the NearMiss. Otherwise, the relevance score of the feature decreases. ReliefFAttributeEval 1.K. Kira and L. Rendell (1992). The Feature Selection Problem: Traditional Methods and a New Algorithm. AAAI-92 Proceedings. 2.I. Kononenko et al. Overcoming the myopia of inductive learning algorithms with RELIEFF (1997), Applied Intelligence, 7(1), p39-55

14 14 Feature selection methods Relief-F. Current object NearHit NearMiss Relevance score for x increases Relevance score for y decreases

15 15 Feature selection methods SVM. This classifier builds a linear function that separates the classes. The hyperplane is calculated so as to maximise the distance to the nearest points. The absolute values of the coefficients in front of the features can be interpreted as “importance”. SVM-RFE. RFE stands for “Recursive Feature Elimination” (Guyon et al., 2006). Starting with an SVM on the entire feature set, a fraction of the features with the lowest weights is dropped. A new SVM is trained with the remaining features, and subsequently reduced in the same way. The procedure stops when the set of the desired cardinality is reached. While SVM-RFE has been found to be extremely useful for wide data such as functional magnetic resonance imaging (fMRI) data (DeMartino et al., 2008), it was discovered that the RFE step is not always needed (Abeel et al., 2010; Geurts et al., 2005). SVMAttributeEval

16 16 Feature selection methods SVM-RFE Eliminate one feature at each iteration (default) SVM Set this value to 0

17 Feature selection methods Ranked attributes: 6 2 GRIP_TEST_Right 5 5 HEIGHT_Standing_cm 4 1 GRIP_TEST_Left 3 4 HEIGHT_Seated_cm 2 3 WEIGHT_Kg 1 6 ARM_SPAN_cm For this example, both SVM and SVM-RFE give the same result Selected attributes: 1,2,5 : 3 GRIP_TEST_Left GRIP_TEST_Right HEIGHT_Standing_cm FCBF Ranked attributes: GRIP_TEST_Right HEIGHT_Standing_cm HEIGHT_Seated_cm GRIP_TEST_Left WEIGHT_Kg ARM_SPAN_cm Relief-F 17

18 Feature selection methods Ranked attributes: 6 2 GRIP_TEST_Right 5 5 HEIGHT_Standing_cm 4 1 GRIP_TEST_Left 3 4 HEIGHT_Seated_cm 2 3 WEIGHT_Kg 1 6 ARM_SPAN_cm For this example, both SVM and SVM-RFE give the same result Selected attributes: 1,2,5 : 3 GRIP_TEST_Left GRIP_TEST_Right HEIGHT_Standing_cm FCBF Ranked attributes: GRIP_TEST_Right HEIGHT_Standing_cm HEIGHT_Seated_cm GRIP_TEST_Left WEIGHT_Kg ARM_SPAN_cm Relief-F PROBLEM While these results are (probably) curious, there is no statistical significance we can attach to them...  18

19 19 Time for a coffee-break

20 Feature selection methods Permutation test Feature of interest: X Class label variable: Y (say, G/N) Let X G be the sample from class G, and X N, the sample from class N. Two-sample t-test can be used to test the hypothesis of equal means when X G and X N come from approximately normal distributions. If we cannot ascertain this condition, use PERMUTATION tests. Quantity of interest V = | mX G - mX N | (difference between the two means) Observed value for our data: V* Question: What is the probability that we observe V* if there was no relationship between X and the class label Y. XY G N G G N... 20

21 Feature selection methods Permutation test p-value = Observed value Histogram of V for permuted labels Very small chance to obtain the observed V* or larger. 21

22 Feature selection methods 1. ANTHRO- HEIGHT - Standing (cm) 1. ANTHRO- HEIGHT - Seated (cm) 1. ANTHRO - GRIP TEST Right 2.1 DT PACE BOWL - Average MPH 1. ANTHRO-WEIGHT (Kg) 1. ANTHRO - GRIP TEST - Left 1. ANTHRO - ARM SPAN (cm) 2.1 DT PACE BOWL - max MPH 8.1 FT - SPRINT (40m) 8.1 FT - SPRINT (30m) Permutation test p-valuefeature 22

23 Neuroscientist Craig Bennett purchased a whole Atlantic salmon, took it to a lab at Dartmouth, and put it into an fMRI machine used to study the brain. The beautiful fish was to be the lab’s test object as they worked out some new methods. So, as the fish sat in the scanner, they showed it “a series of photographs depicting human individuals in social situations.” To maintain the rigor of the protocol (and perhaps because it was hilarious), the salmon, just like a human test subject, “was asked to determine what emotion the individual in the photo must have been experiencing.” The Dead Salmon Lo and behold! Brain activity responding to the stimuli! 23

24 24 Bonferroni correction for multiple comparisons = the simplest and most conservative method to control the familywise error rate If we increase the number of hypotheses in a test, we also increase the likelihood of witnessing a rare event, and therefore declaring difference when there is none. So, if the desired significance level for the whole family of n tests should be (at most) α, then the Bonferroni correction would test each individual hypothesis at a significance level of α /n. In our case, we have n = 50, significance level 0.05/50 =

25 Feature selection methods 1. ANTHRO- HEIGHT - Standing (cm) 1. ANTHRO- HEIGHT - Seated (cm) 1. ANTHRO - GRIP TEST Right 2.1 DT PACE BOWL - Average MPH 1. ANTHRO-WEIGHT (Kg) 1. ANTHRO - GRIP TEST - Left 1. ANTHRO - ARM SPAN (cm) 2.1 DT PACE BOWL - max MPH 8.1 FT - SPRINT (40m) 8.1 FT - SPRINT (30m) Permutation test p-valuefeature PROBLEM None of the features survives the Bonferroni correction (p < for significance level 0.05). 25

26 Feature selection methods Permutation test More PROBLEMs 1.If there are permutation tests in WEKA, they are hidden very well... 2.If there is Bonferroni correction in WEKA, it is hidden very well too... Solution? DIY... 26

27 Feature selection methods Permutation test 1.Calculate the observed value V*. Choose the number of iterations, e.g., T = 10, for i = 1:T a)Permute the labels randomly b)Calculate and store V(i) with the permuted labels 3.end (for) 4.Calculate the p-value as the proportion of V greater than or equal to V*. 5.If you do this experiment for n features, compare p with alpha/n, where alpha is your chosen significance level (typically alpha = 0.05). Here is an algorithm for those of you with some programming experience: (the null hypothesis is “no difference”, hence V = 0; assume the greater the V, the larger the difference) 27

28 Feature selection methods Permutation test And here is a MATLAB script % Permutation test (assume that there are no missing values) clear, close, clc X = xlsread('ECB U talent testing data.xlsx',... 'U13 Talent Test Raw Data','G2:L27'); [~,Y] = xlsread('ECB U talent testing data.xlsx',... 'U13 Talent Test Raw Data','F2:F27'); % symbolic label [~,Names] = xlsread('ECB U talent testing data.xlsx',... 'U13 Talent Test Raw Data','G1:L1'); % feature names % Convert Y to numbers (1 selected, 2 not selected) u = unique(Y); L = ones(size(Y)); L(strcmp(u(1),Y)) = 2; T = 20000; continues on the next slide 28

29 Feature selection methods Permutation test And here is a MATLAB script continued from previous slide... for i = 1:T la = L(randperm(numel(L))); for j = 1:size(X,2) % for each feature fe = X(:,j); V(i,j) = abs(mean(fe(la == 1)) - mean(fe(la == 2))); end % p-values for the features for j = 1:size(X,2) V_star(j) = abs(mean(X(L == 1,j)) - mean(X(L == 2,j))); p(j) = mean(V(:,j) > V_star(j)); fprintf('%35s %.4f\n',Names{j},p(j)) end 29

30 Feature selection methods Permutation test MATLAB output 1. ANTHRO - GRIP TEST - Left ANTHRO - GRIP TEST Right ANTHRO-WEIGHT (Kg) ANTHRO- HEIGHT - Seated (cm) ANTHRO- HEIGHT - Standing (cm) ANTHRO - ARM SPAN (cm) The numbers may vary slightly from one run to the next because of the random generator. However, the larger the iteration number (T), the better. The p-values are not corrected (Bonferroni). Correction should be applied if necessary. 30

31 31 Time for a coffee-break

32 Time for our classifiers!!! 32 The Classification tab Choose a classifier (SVM) Choose a training- testing protocol When ready (all chosen) click here

33 Where to find the results 33 The confusion matrix

34 Where to find the results 34 Classification accuracy (and classification error)

35 35 And a lot lot more...

36 36 Thank you!


Download ppt "Pattern Recognition and Machine Learning Lucy Kuncheva School of Computer Science Bangor University Part 2 1."

Similar presentations


Ads by Google