Multivariate Analysis

Name: Multivariate Analysis
Uploaded: 2017-09-09T05:05:56+00:00
Duration: PTM9S5
Channel: Precious Staker
Description: Multivariate Analysis

Multivariate Analysis
Pattern Analysis Finding patterns among objects on which two or more independent variables have been measured Principal Coordinates Analysis (PCO) Principal Components Analysis (PCA) (Flury 1988) Cluster analysis (Everitt 1992) Allow the projection of multivariate phenotypic or genotypic measurements in lower dimensional spaces so that the underlying patterns or structures can be described and visually displayed The ‘genetic’ patterns among a set of entities (genetic materials) is difficult to discern from DNA fingerprints (raw multivariate data) Patterns among the entities can be ‘extracted’ by PCA, PCO or cluster analyses of pairwise genetic distance matrices

Principal Components Analysis (PCA)
Neighbor-Joining Cladogram

Similarity and Dissimilarity (Genetic Distance) Measures
Applications include: Assessment genetic relationships Prediction of heterosis Heterotic group definition Identification of duplicates in collections Assessment of genetic diversity Plant variety protection

Similarity and Dissimilarity (Genetic Distance) Measures
Choice of distance measure is affected by: Properties of marker system Genealogy of germplasm Lines or populations Objectives of study Subsequent multivariate analysis

Genetic distance (Dissimilarity) measures based on allele frequency data
The first step is to build a matrix of pair-wise measures of dissimilarity Multiple indexes can be used to estimate dissimilarity

Genetic distance measures based on allele frequency data
(Reif et al Crop Science 45 (1), 1-7

Euclidean (dE) - No underlying genetic concept. Can be used with multivariate methods that require Euclidean distances Roger (1972) (dR) - Linearly related to coefficient of coancestry Modified Roger’s (dW) - dW2 is linearly related to panmictic-midparent heterosis Cavalli-Sforza and Edwards (1967) (dCE) - Based on Kimura’s (1954) model of selective drift

Reynolds et al. (1983) (dRE) – Based on a model where mutation and selection can be neglected and drift is the major evolutionary force Nei (1972) (dN72) - Based on the infinite-allele model (Kimura and Crow, 1964) Nei et al. (1983) (dN83) - For homozygous inbred lines, dN83 = dR and, hence, dN83 is also linearly related to the coancestry coefficient

Similarity Measures for Binary Data
Entity i Entity j Count Condition Present (1) a (vij) Positive match Absent (0) b (wij) Mismatch c (xij) d (yij) Negative match Simple matching Jaccard (1908) Dice (1945)

Shared allele distance
S = No. of shared alleles u = No. of loci (Bowcock et al. 1994)

Similarity Measures for Binary Data
Individual Marker1 Marker 2 Marker 3 Marker 4 Marker 5 Marker 6 Marker 7 Marker 8 Marker 9 Marker 10 1 3 Similarity Simple matching Rank Shared alleles Jaccard’s Dice’s s12 0.70 2 0.40 1 0.57 s13 0.50 3 0.00 s23 0.80 0.33

PRINCIPAL COORDINATES ANALYSIS (PCO or PCoA)
Distance between Oregon towns (miles) Genetic distance between barley varieties (Nei et al., 1983 index)

Principal Coordinates Analysis is a method to visualize similarities or dissimilarities of data.
It starts with a distance matrix (dissimilarity) and assigns for each item a location in a 2 or 3 dimensional space

PRINCIPAL COMPONENTS ANALYSIS (PCA)
Transforms a number of possibly correlated variables (in this case allelic states) into a smaller number of uncorrelated variables called principal components. The first principal component accounts for as much of the variability in the data as possible, and each succeeding component accounts for as much of the remaining variability as possible

The goal of PCA is to reduce the dimensionality of the data while retaining as much as possible of the variance of the observed variables: Reduces the number of observed variables to a smaller number of principal components which account for most of the variance The total amount of variance in PCA is equal to the number of observed variables being analyzed. Observed variables are standardized, e.g., mean=0, standard deviation=1 The first principal component identified accounts for most of the variance in the data. The second component identified accounts for the second largest amount of variance in the data and is uncorrelated with the first principal component and so on. Components accounting for maximal variance are retained while other components accounting for a trivial amount of variance are not retained. Eigenvalues indicate the amount of variance explained by each component. Eigenvectors are the weights used to calculate components scores.

Distance-based methods (starting from a distance matrix)
Cluster Analysis: Individuals with similar descriptions are mathematically gathered into a cluster. Distance-based methods (starting from a distance matrix) UPGMA (Unweighted Pair Group Method with Arithmetic Mean) Neighbor-Joining Model-Based methods Neighbor-Joining Cladogram

POPULATION STRUCTURE Marker 1 Marker 2 Marker 3 Marker 4 Marker 5 Marker 6 Marker 7 Marker 8 Marker 9 Marker 10 Individual 1 1 Individual 2 Individual 3 Individual 4 Individual 5 Individual 6 Individual 7 Individual 8 Individual 9 Individual 10 Individual 11 Individual 12 Individual 13 Individual 14 Individual 15 Individual 16 Individual 17 Individual 18 Individual 19 Individual 20 Hypothesis 1: There is one population that has intermediate frequencies at all loci and all individuals are from that population Hypothesis 2: There are two populations: blue and pink, with high allele frequency at some loci and low allele frequency at other loci

POPULATION STRUCTURE It is important to estimate:
How many subpopulations there are - To which subpopulation each individual belongs (%)

Multivariate Analysis

Similar presentations

Presentation on theme: "Multivariate Analysis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multivariate Analysis

Similar presentations

Presentation on theme: "Multivariate Analysis"— Presentation transcript:

Similar presentations

About project

Feedback