Linear Discriminant Function Classification vs. Prediction Classification & ANOVA Classification Cutoffs, Errors, etc. Multivariate Classification & LDF.

Slides:



Advertisements
Similar presentations
Bivariate &/vs. Multivariate
Advertisements

Transformations & Data Cleaning
Canonical Correlation simple correlation -- y 1 = x 1 multiple correlation -- y 1 = x 1 x 2 x 3 canonical correlation -- y 1 y 2 y 3 = x 1 x 2 x 3 The.
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
ANCOVA Workings of ANOVA & ANCOVA ANCOVA, Semi-Partial correlations, statistical control Using model plotting to think about ANCOVA & Statistical control.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Correlation & Regression Chapter 15. Correlation statistical technique that is used to measure and describe a relationship between two variables (X and.
Metrics, Algorithms & Follow-ups Profile Similarity Measures Cluster combination procedures Hierarchical vs. Non-hierarchical Clustering Statistical follow-up.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Multiple Group X² Designs & Follow-up Analyses X² for multiple condition designs Pairwise comparisons & RH Testing Alpha inflation Effect sizes for k-group.
Principal Components An Introduction Exploratory factoring Meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.
Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.
Linear Discriminant Function LDF & MANOVA LDF & Multiple Regression Geometric example of LDF & multivariate power Evaluating & reporting LDF results 3.
Statistical Control Regression & research designs Statistical Control is about –underlying causation –under-specification –measurement validity –“Correcting”
Inferences About Means of Two Independent Samples Chapter 11 Homework: 1, 2, 3, 4, 6, 7.
Regression Models w/ k-group & Quant Variables Sources of data for this model Variations of this model Main effects version of the model –Interpreting.
Multiple Group X² Designs & Follow-up Analyses X² for multiple condition designs Pairwise comparisons & RH Testing Alpha inflation Effect sizes for k-group.
ANCOVA Workings of ANOVA & ANCOVA ANCOVA, Semi-Partial correlations, statistical control Using model plotting to think about ANCOVA & Statistical control.
Multiple Regression Models Advantages of multiple regression Important preliminary analyses Parts of a multiple regression model & interpretation Differences.
Multiple Regression Models: Some Details & Surprises Review of raw & standardized models Differences between r, b & β Bivariate & Multivariate patterns.
Multiple Regression Models
Analyses of K-Group Designs : Analytic Comparisons & Trend Analyses Analytic Comparisons –Simple comparisons –Complex comparisons –Trend Analyses Errors.
Simple Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas –quantitative vs. binary predictor.
MANOVA LDF & MANOVA Geometric example of MANVOA & multivariate power MANOVA dimensionality Follow-up analyses if k > 2 Factorial MANOVA.
Introduction to Multivariate Research & Factorial Designs
Multiple-group linear discriminant function maximum & contributing ldf dimensions concentrated & diffuse ldf structures follow-up analyses evaluating &
Simple Correlation Scatterplots & r Interpreting r Outcomes vs. RH:
Bivariate & Multivariate Regression correlation vs. prediction research prediction and relationship strength interpreting regression formulas process of.
Parametric & Nonparametric Models for Within-Groups Comparisons overview X 2 tests parametric & nonparametric stats Mann-Whitney U-test Kruskal-Wallis.
Chapter 9 - Lecture 2 Computing the analysis of variance for simple experiments (single factor, unrelated groups experiments).
Today Concepts underlying inferential statistics
The one sample t-test November 14, From Z to t… In a Z test, you compare your sample to a known population, with a known mean and standard deviation.
2x2 BG Factorial Designs Definition and advantage of factorial research designs 5 terms necessary to understand factorial designs 5 patterns of factorial.
An Introduction to Classification Classification vs. Prediction Classification & ANOVA Classification Cutoffs, Errors, etc. Multivariate Classification.
Multiple Regression Research Methods and Statistics.
Intro to Parametric Statistics, Assumptions & Degrees of Freedom Some terms we will need Normal Distributions Degrees of freedom Z-values of individual.
Relationships Among Variables
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.2 Estimating Differences.
Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.
Some matrix stuff.
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Discriminant Analysis
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Coding Multiple Category Variables for Inclusion in Multiple Regression More kinds of predictors for our multiple regression models Some review of interpreting.
ANOVA and Linear Regression ScWk 242 – Week 13 Slides.
Chapter 10: Analyzing Experimental Data Inferential statistics are used to determine whether the independent variable had an effect on the dependent variance.
I. Statistical Tests: A Repetive Review A.Why do we use them? Namely: we need to make inferences from incomplete information or uncertainty þBut we want.
Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.
Regression Models w/ 2 Quant Variables Sources of data for this model Variations of this model Main effects version of the model –Interpreting the regression.
Chapter 13 - ANOVA. ANOVA Be able to explain in general terms and using an example what a one-way ANOVA is (370). Know the purpose of the one-way ANOVA.
Adjusted from slides attributed to Andrew Ainsworth
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
Meta-Analysis Effect Sizes effect sizes – r, d & OR computing effect sizes estimating effect sizes & other things of which to be careful!
 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?
ANCOVA Workings of ANOVA & ANCOVA ANCOVA, partial correlations & multiple regression Using model plotting to think about ANCOVA & Statistical control Homogeneity.
Unit 7 Statistics: Multivariate Analysis of Variance (MANOVA) & Discriminant Functional Analysis (DFA) Chat until class starts.
© 2006 by The McGraw-Hill Companies, Inc. All rights reserved. 1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent.
Chapter 8 Relationships Among Variables. Outline What correlational research investigates Understanding the nature of correlation What the coefficient.
D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.
DISCRIMINANT ANALYSIS. Discriminant Analysis  Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Regression. Why Regression? Everything we’ve done in this class has been regression: When you have categorical IVs and continuous DVs, the ANOVA framework.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
Data, Univariate Statistics & Statistical Inference
Multiple Regression.
I. Statistical Tests: Why do we use them? What do they involve?
Presentation transcript:

Linear Discriminant Function Classification vs. Prediction Classification & ANOVA Classification Cutoffs, Errors, etc. Multivariate Classification & LDF Geometric example of LDF & multivariate power k-group LDF Concentrated vs. Diffuse Structure Follow-up LDF Analyses

Follow-up Analyses for Multiple-group linear discriminant function evaluating & reporting k-group ldf

Let’s start by reviewing what “prediction” is… Using a person’s scores on one or more variables to make a “best guess” of the that person’s score on another variable (the value of which isn’t known) Classification is very similar … Using a person’s scores on one or more variables to make a “best guess” of the category to which that person belongs (when the category type isn’t known). The difference -- a language “convention” if the “unknown variable” is quantitative -- its called prediction if the “unknown variable” is qualitative -- its called classification

How does classification work??? Let’s start with an “old friend” -- ANOVA In its usual form… There are two qualitatively different IV groups naturally occurring or “created” by manipulation A quantitative DV H0: Mean G1 = Mean G2 Rejecting H0: tells us There is a relationship between the grouping and DV Groups represent populations with different means on the DV Knowing what group a person in allows us to guess their DV score -- mean of that group

Let’s review in a little more detail… Remember the formula for the ANOVA F-test variation between groups size of the mean difference F = = variation within groups variation within groups In words -- F compares the mean difference to the variability around each of those means Which of the following will produce the larger F-test ? Why ? Data #1 n = 50) group 1 mean = 30 std dev = 5 group 2 mean = 50 std dev = 5 Data #2 n = 50) group 1 mean = 30 std dev = 15 group 2 mean = 50 std dev = 15 Remember -- about 96% of scores are within 2 std dev of mean

Graphical depictions of these data show that the size of F relates to the amount of overlap between the groups Notice: Since all the distributions have n=50, those with more variability are not as tall -- all 4 distributions have the same area Data #1 Data #2 Larger F = more consistent grp dif Smaller F = less consistent grp dif

Let’s consider that last one “in reverse”… Could knowing the person’s score help tell us what qualitative group they are in? …to “assign” them to the proper group? an Example… Research has revealed a statistical relationship between the number of times a person laughs out loud each day (quant variable) and whether they are depressed or schizophrenic (qual grouping variable). Mean laughs Depressed = 4.0 Mean laughs Schizophrenic = 7.0 F(1,34) = 7.00, p <.05 A new (as yet undiagnosed) patient laughs 11 times the first day what’s your “assignment” depressed or schizophrenic? Another patient laughs 1 time -- your “assignment”? A third new patient laughs 5 times -- your “assignment”?

Why were the first two “gimmies” and the last one not? When the groups have a mean difference, a score beyond one of the group means is more likely to belong to that group than to belong to the other group (unless stds are huge) someone who laughs more than the mean for the schizophrenic group is more likely to be schizohrenic than to be depressed someone who laughs less than the mean of the depressive group is more likely to be depressed than to be schizophrenic Even when the groups have a mean difference, a score between the group means is harder to correctly assign (unless stds are miniscule) someone with 5-6 laughs are hardest to classify, because several depressed and schizophrenic folks have this score

Here’s a graphical depiction of the clinical data... X 18 dep. patients x x x mean laughs = 4.0 x x x x x x x x x x x x x x laughs --> o o o o 18 schiz. patients o o o o o mean laughs = 7.0 o o o o o o o o o Looking at this, its easy to see why we would be... confidant in an assignment based on 11 laughs no depressed patients had a score that high confident in an assignment based on 1 laugh no schizophrenic patients had a score that low lacking confidence in an assignment based on 5 or 6 laughs several depressed & schizophrenic patients had 5 or 6

The process of prediction required two things… that there be a linear relationship between the predictor and the criterion (reject H0: r = 0) a formula (y’ = bx + a) to “translate” a predictor score into an estimate of a criterion variable score Similarly, the process of classification requires two things … a statistical relationship between the predictor (DV) & criterion (reject H0: M 1 = M 2 ) a cutoff to “translate” a person’s score on the predictor (DV) into an assignment to one group or the other where should be place the cutoff??? Wherever gives us the most accurate classification !!

X 18 dep. patients x x x mean laughs = 4.0 x x x x x x x x x x x x x x laughs --> o o o o 18 schiz. patients o o o o o mean laughs = 7.0 o o o o o o o o o When your groups are the same size and your group score distributions are symmetrical, things are pretty easy… place the cutoff at a position equidistant from the group means here, the cutoff would be equidistant between 4.0 and 7.0 anyone who laughs more than 5.5 times would be “assigned” as schizophrenic anyone who laughs fewer than 5.5 times would be “assigned” as depressed

x 18 dep. patients x x x mean laughs = 4.0 x x x x x x x x x x x x x x laughs --> o o o o 18 schiz. patients o o o o o mean laughs = 7.0 o o o o o o o o o We can assess the accuracy of the assignments by building a “reclassification table” Actual Diagnosis Assignment Depressed Schizophrenic Depressed 14 4 Schizophrenic 4 14 reclassification accuracy would be 28/36 = 77.78%

Getting ready for ldf… multiple regression works better than simple regression because a y’ based on multiple predictors is a better estimate of y than a y’ based on a single predictor similarly, classification based on multiple predictors will do better than classification based on a single predictor but, how to incorporate multiple predictors into a classification ?? Like with multiple regression, multiple variables (Xs) are each given a weighting and a constant is added ldf = b 1 * X 1 + b 2 * X 2 + b 3 * X 3 + a the composite variable is called a linear discriminant function function -- constructed from another variables linear -- linear combination of linearly weighted vars discriminant -- weights are chosen so that the resulting has the maximum possible F-test between the groups

So, how does this all work ??? We start with a grouping variable and a set of quantitative (or binary) predictors (what would be DVs if doing ANOVAs) using an algorithm much like multiple regression, the bivariate relationship of predictor to the grouping variable & the collinearities among the predictors are all taken into account and the weights for the ldf formula are derived remember this ldf will have the largest possible F value between the groups a cutoff value for the ldf is chosen the cutoff is chosen (more fancy computation) to maximize % correct reclassification to “use” the formula a person’s values on the variables are put into the formula & their ldf score is computed their score is compared to the cutoff, and they are assigned to one group or the other

How ldf works -- two groups and 2 vars Var #1 Var#2Var#2 Plot each participant’s position in this “2-space”, keeping track of group membership. Mark each groups “centroid”

Look at the group difference on each variable, separately. Var #1 Var#2Var#2 The dash/dot lines show the mean difference on each variable -- which are small relative to within-group differences, so small Fs

The ldf variate “positioned” to maximize F Var #1 Var#2Var#2 In this way, two non-discriminating variables can combine to “work”

Like ANOVA, ldf can be applied to more than two groups. When we have multiple groups there may be an advantage to using multiple discriminant functions to maximally discriminate between the groups. That is, we must decide whether the multiple groups “line up” on a single dimension (called a concentrated structure), or whether they are best described by their position in a multidimensional “space” (called a diffuse structure). Maximum # dimensions for a given analysis: the smaller of# groups - 1 # predictor variables e.g., 4 groups with 6 predictor variables ? Max # ldfs = _____

“Anticipating” the number of dimensions (ldfs) By inspecting the “group profiles,” (means of each group on each of the predictor variables) you can often anticipate whether there will be more than one ldf … if the groups have similar patterns of differences (similar profiles) for each predictor variable (for which there are differences), then you would expect a single discriminant function. If the groups have different profiles for different predictor variables, then you would expect more than one ldf Group Var1 Var2 Var3 Var4Group Var1 Var2 Var3 Var Concentrated Diffuse 1st nd

Determining the number of dimensions (ldfs) Like other “determinations”, there is a significance test involved Each ldf is tested as to whether it “contributes to the model” using the X²-test of the -value. The first ldf will always account for the most between-group variation (have the largest X² and R c ) -- subsequent ldfs are “orthogonal” (providing independent information), and will account for successively less between group variation. If there is a single ldf, then the model is said to have a concentrated structure if there are 2 or more ldfs then the model has a diffuse structure the distinction between a concentrated and a diffuse structure is considered the “fundamental multivariate question” in a multiple group analysis.

Follow-up analyses Within Psychology, ldf developed in areas of research that traditionally used large samples (e.g., measurement theory & clinical diagnostic research). With such large samples, “almost everything is significant”. So, an emphasis on “substantial effects” developed - based on “cutoffs” and “relative size” rather than significance tests using % variance to determine if additional ldfs “contribute”.3-.4 cutoff for structure weights when interpreting the ldfs using % classification to discuss “what the model does” As ldf was “adopted” into research areas with strong traditions of significance testing, more tests were incorporated into ldf, most commonly “follow-ups”

There are three major types of follow-ups Univariate follow-ups -- abandoning the multivariate analysis, simply describe the results of the ANOVA (with pairwise comparisons) for each of the predictors (DVs) ldf follow-ups -- use the ldf(s) as DVs in ANOVA (with pairwise comparisons) to explicate what which ldfs discriminate between what groups this nicely augments the spatial & re-classification depictions if you have a concentrated structure, it tells you exactly what groups can be significantly discriminated if you have a diffuse structure, it tells you whether the second ldf provides discriminatory power the 1st doesn’t

pairwise ldf follow-ups -- separate ldf analyses for each pair of groups to explicate which variables maximally discriminate between what groups this is just what it sounds like compare groups 1 & 2 then 1 & 3 then 2 & 3 interpret the ldf and tell the discriminatory power for each might produce pairwise discriminations not provided by the overall analysis Different texts/researchers seem to have strong opinions about which of these is the “true multivariate follow-up” or which is “more multivariate”. I’d suggest that usually one of them is a more direct test of the way you have conceptualized your research question or analysis. Trying both probably won’t hurt!