Usman Roshan Machine Learning

Slides:



Advertisements
Similar presentations
Contingency Table Analysis Mary Whiteside, Ph.D..
Advertisements

Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise – Example Uncertainty in fall time data Bootstrapping.
R OBERTO B ATTITI, M AURO B RUNATO The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Feb 2014.
Basic Statistics The Chi Square Test of Independence.
Multiple Regression. Outline Purpose and logic : page 3 Purpose and logic : page 3 Parameters estimation : page 9 Parameters estimation : page 9 R-square.
Hypothesis Testing IV Chi Square.
COMPUTER AIDED DIAGNOSIS: FEATURE SELECTION Prof. Yasser Mostafa Kadah –
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Genome-wide association studies Usman Roshan. Recap Single nucleotide polymorphism Genome wide association studies –Relative risk, odds risk (or odds.
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
Filter + Support Vector Machine for NIPS 2003 Challenge Jiwen Li University of Zurich Department of Informatics The NIPS 2003 challenge was organized to.
Usman Roshan Machine Learning, CS 698
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)
CHI SQUARE TESTS.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Correlation. Correlation is a measure of the strength of the relation between two or more variables. Any correlation coefficient has two parts – Valence:
Chapter Eight: Using Statistics to Answer Questions.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Genome-wide association studies
Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006.
Chapter 11: Categorical Data n Chi-square goodness of fit test allows us to examine a single distribution of a categorical variable in a population. n.
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Basic statistics Usman Roshan.
Comparing Counts Chi Square Tests Independence.
Applied statistics Usman Roshan.
Basic Statistics The Chi Square Test of Independence.
Analysis and Interpretation: Multiple Variables Simultaneously
T-test calculations.
BINARY LOGISTIC REGRESSION
Applied statistics Usman Roshan.
Chi-Square hypothesis testing
Chapter 9: Non-parametric Tests
Parallel chi-square test
Basic statistics Usman Roshan.
Dr. Siti Nor Binti Yaacob
Background on Classification
Trees, bagging, boosting, and stacking
Making Use of Associations Tests
Roberto Battiti, Mauro Brunato
Feature selection Usman Roshan.
Contingency Tables.
Usman Roshan Machine Learning
Analyzing the Association Between Categorical Variables
Least squares linear classifier
Chapter 26 Comparing Counts.
Chapter 7: Transformations
Feature Selection Methods
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR
Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.
Inference for Two Way Tables
Chapter Nine: Using Statistics to Answer Questions
Making Use of Associations Tests
Chapter 18: The Chi-Square Statistic
Chapter 26 Comparing Counts.
Inference for Two-way Tables
Karl L. Wuensch Department of Psychology East Carolina University
Quadrat sampling & the Chi-squared test
Hypothesis Testing - Chi Square
Quadrat sampling & the Chi-squared test
Presentation transcript:

Usman Roshan Machine Learning Feature selection Usman Roshan Machine Learning

What is feature selection? Consider our training data as a matrix where each row is a vector and each column is a dimension. For example consider the matrix for the data x1=(1, 10, 2), x2=(2, 8, 0), and x3=(1, 9, 1) We call each dimension a feature or a column in our matrix.

Feature selection Useful for high dimensional data such as genomic DNA and text documents. Methods Univariate (looks at each feature independently of others) Pearson correlation coefficient F-score Chi-square Signal to noise ratio And more such as mutual information, relief Multivariate (considers all features simultaneously) Dimensionality reduction algorithms Linear classifiers such as support vector machine Recursive feature elimination

Feature selection Methods are used to rank features by importance Ranking cut-off is determined by user Univariate methods measure some type of correlation between two random variables. We apply them to machine learning by setting one variable to be the label (yi) and the other to be a fixed feature (xij for fixed j)

Pearson correlation coefficient Measures the correlation between two variables Formulas: Covariance(X,Y) = E((X-μX)(Y-μY)) Correlation(X,Y)= Covariance(X,Y)/σXσY Pearson correlation = The correlation r is between -1 and 1. A value of 1 means perfect positive correlation and -1 in the other direction

Pearson correlation coefficient From Wikipedia

F-score From Lin and Chen, Feature extraction, 2006

Chi-square test Contingency table We have two random variables: Label (L): 0 or 1 Feature (F): Categorical Null hypothesis: the two variables are independent of each other (unrelated) Under independence P(L,F)= P(D)P(G) P(L=0) = (c1+c2)/n P(F=A) = (c1+c3)/n Expected values E(X1) = P(L=0)P(F=A)n We can calculate the chi-square statistic for a given feature and the probability that it is independent of the label (using the p-value). Features with very small probabilities deviate significantly from the independence assumption and therefore considered important. Observed=c4 Expected=X4 Observed=c3 Expected=X3 Label=1 Observed=c2 Expected=X2 Observed=c1 Expected=X1 Label=0 Feature=B Feature=A

Signal to noise ratio Difference in means divided by difference in standard deviation between the two classes S2N(X,Y) = (μX - μY)/(σX – σY) Large values indicate a strong correlation

Multivariate feature selection Consider the vector w for any linear classifier. Classification of a point x is given by wTx+w0. Small entries of w will have little effect on the dot product and therefore those features are less relevant. For example if w = (10, .01, -9) then features 0 and 2 are contributing more to the dot product than feature 1. A ranking of features given by this w is 0, 2, 1.

Multivariate feature selection The w can be obtained by any of linear classifiers we have see in class so far A variant of this approach is called recursive feature elimination: Compute w on all features Remove feature with smallest wi Recompute w on reduced data If stopping criterion not met then go to step 2

Feature selection in practice NIPS 2003 feature selection contest Contest results Reproduced results with feature selection plus SVM Effect of feature selection on SVM Comprehensive gene selection study comparing feature selection methods Ranking genomic causal variants with SVM and chi-square

Limitations Unclear how to tell in advance if feature selection will work Only known way is to check but for very high dimensional data (at least half a million features) it helps most of the time How many features to select? Perform cross-validation