Feature Selection Which features work best? One way to rank features: –Make a contingency table for each F –Compute abs ( log ( ad / bc ) ) –Rank the log.

Slides:



Advertisements
Similar presentations
A brief review of non-neural-network approaches to deep learning
Advertisements

Categorical Data Analysis
Classification Classification Examples
One-Way ANOVA Multiple Comparisons.
Assuming normally distributed data! Naïve Bayes Classifier.
Artificial Neural Networks ECE 398BD Instructor: Shobha Vasudevan.
The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.
Confidence Estimation for Machine Translation J. Blatz et.al, Coling 04 SSLI MTRG 11/17/2004 Takahiro Shinozaki.
Single Category Classification Stage One Additive Weighted Prototype Model.
Logistic Regression Rong Jin. Logistic Regression Model  In Gaussian generative model:  Generalize the ratio to a linear model Parameters: w and c.
12.The Chi-square Test and the Analysis of the Contingency Tables 12.1Contingency Table 12.2A Words of Caution about Chi-Square Test.
Reduced Support Vector Machine
Ensemble Learning: An Introduction
Linear Methods for Classification
Chapter 5 Part II 5.3 Spread of Data 5.4 Fisher Discriminant.
Three kinds of learning
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Chapter 3 Combinational Logic Design
Automatic Authorship Identification (Part II) Diana Michalek, Ross T. Sowell, Paul Kantor, Alex Genkin, David Madigan, Fred Roberts, and David D. Lewis.
CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.
Discriminant Analysis Testing latent variables as predictors of groups.
1  The goal is to estimate the error probability of the designed classification system  Error Counting Technique  Let classes  Let data points in class.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
ENSEMBLE LEARNING David Kauchak CS451 – Fall 2013.
Artificial Intelligence Lecture No. 29 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
6.3Find Probabilities Using Combinations
1 Bayesian Methods. 2 Naïve Bayes New data point to classify: X=(x 1,x 2,…x m ) Strategy: – Calculate P(C i /X) for each class C i. – Select C i for which.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
CLASSIFICATION: Ensemble Methods
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
xy ? {(3,-2), (5,4), (-2,0)}. A relation is set of ordered pairs, where one number is mapped to another number.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Predicting outcomes of rectus femoris transfer surgery.
Maximum Entropy Models and Feature Engineering CSCI-GA.2590 – Lecture 6B Ralph Grishman NYU.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Lecture notes 9 Bayesian Belief Networks.
Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate.
Session 13: Correlation (Zar, Chapter 19). (1)Regression vs. correlation Regression: R 2 is the proportion that the model explains of the variability.
Graphs We often use graphs to show how two variables are related. All these examples come straight from your book.
Learn to find the probabilities of independent and dependent events. Course Independent and Dependent Events.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
1 Statistics & R, TiP, 2011/12 Neural Networks  Technique for discrimination & regression problems  More mathematical theoretical foundation  Works.
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Pre-Algebra 9-7 Independent and Dependent Events Learn to find the probabilities of independent and dependent events.
Lecture 5: Statistical Methods for Classification CAP 5415: Computer Vision Fall 2006.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
1 Statistics & R, TiP, 2011/12 Multivariate Methods  Multivariate data  Data display  Principal component analysis Unsupervised learning technique 
Human features are those things created by man.
Linear Models Tony Dodd. 21 January 2008Mathematics for Data Modelling: Linear Models Overview Linear models. Parameter estimation. Linear in the parameters.
Chapter 3: Maximum-Likelihood Parameter Estimation
Maximum Entropy Models and Feature Engineering CSCI-GA.2591
Function Tables.
Perceptrons Lirong Xia.
Basic machine learning background with Python scikit-learn
Chapter 7: Sampling Distributions
Linear Functions SOL 8.14, SOL 8.16, SOL 8.17.
Classification Discriminant Analysis
تصنيف التفاعلات الكيميائية
Classification Discriminant Analysis
Introduction to Data Mining, 2nd Edition
Warm Up Problem of the Day Lesson Presentation Lesson Quizzes.
Introduction to Sensor Interpretation
Feature Selection Methods
Response: Conductivity (milliohm/cm), Corrected for Concentration (CC)
Introduction to Sensor Interpretation
Automatic Handwriting Generation
Perceptrons Lirong Xia.
Presentation transcript:

Feature Selection Which features work best? One way to rank features: –Make a contingency table for each F –Compute abs ( log ( ad / bc ) ) –Rank the log values ab cd F Madison Hamilton Not F

49 Ranked Features

Linear Discriminant Analysis A technique for classifying data Available in the R statistics package Input: –Table of training data –Table of test data Output: –Classification of test data

Linear Discriminant Analysis: example Input training data: upon 2-letter 3-letter M M M M M H H H H H upon 2-letter 3-letter Input test data: Ouput: m m m m h

Some more LDA results 12 to Madison: –upon, 1-letter, 2-letter –upon, enough, there –upon, there 11 to Madison: –upon, 2-letter, 3-letter < 6 to Madison –2-letter, 3-letter –there, 1-letter, 2-letter

Some more LDA results ClassOutput of lda Features tested 12 Mm m m m m m upon apt Mm m m m m m to upon Mm m m m m m h m m m m mon there Mh m m m m m m m m m m man by Mm m m m m m h m m m h mparticularly probability M m m m m m m h h h m h malso of M m m m h m m h h m m h malways of M h m m h m h h m h m m mof work M m m h m m m h h m h h hthere language M m h m h h m h h h m m hconsequently direction 5 11

Feature Selection Part II Which combinations of features are best for LDA? Are the features independent? We did some random sampling: –Choose features a, b, c, d –Compute x = log a + log b + log x + log d –Compute y = log (a+b+c+d) –Plot x versus y

Selecting more features What happens when more than 4 features are used for the lda? Greedy approach –Add features one at a time from two lists –Perform lda on all features chosen so far Is overfitting a problem?

First few greedy iterations 6 M 6 H h m h h m h m m h m h m 2-letter words 12 M 0 H m m m m m m upon 12 M 0 H m m m m m m 1-letter words 12 M 0 H m m m m m m 5-letter words 11 M 1 H m m m m m h m m m m m m 4-letter words 12 M 0 H m m m m m m there 12 M 0 H m m m m m m enough 11 M 1 H m m m m m m h m m m m m whilst 12 M 0 H m m m m m m 3-letter words 11 M 1 H m m m m m m h m m m m m 15-letter words