Download presentation
Presentation is loading. Please wait.
1
Applied Multivariate Quantitative Methods
Introduction and Examples of Multivariate Data By Jen-pei Liu, PhD Division of Biometry, Department of Agronomy, National Taiwan University and Wei-Chie Chie, MD, PhD Department of Public Health 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
2
Introduction and Examples of Multivariate Data
Most of statistical methods only focus on analysis of one random variable at a time – Univariate analysis Ordinary regression analysis investigates the empirical relationship between a random variable (dependent variable) and a set of explanatory variables (nonrandom independent variables) and is still a univariate analysis However, univariate analysis can not provide a comprehensive and overall description for most of phenomena we are interested in 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
3
Introduction and Examples of Multivariate Data
Clinical trials: Efficacy Stroke NIH stroke scale Bartel score Modified Rankin score Safety Adverse events Laboratory evaluations Vital signs 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
4
Introduction and Examples of Multivariate Data
Quality of Life Physical Mental Social Well-being Microarray Intensity of tens of thousands of transcripts 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
5
Introduction and Examples of Multivariate Data
Continuous Endpoints Numerical discrete data Heart beats per minutes Total NINSS Total Hamilton Rating Scale for Depression Total Alzheimer’s Disease Assessment Scale Numerical continuous data Age Weight ALT Peak flow rate (liters per minute) FEV1 (% of predicted value) 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
6
Introduction and Examples of Multivariate Data
Categorical Endpoints Nominal scale data Classification of patients according to their attributes Gender Race Occurrence of a particular adverse reaction Occurrence of ALT>3 times upper normal limit Ordinal scale data A certain order among different categories 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
7
Introduction and Examples of Multivariate Data
Ordered categorical data Symptom score 0 = no symptom, 1 = mild, 2 = moderate, 3 = severe Score of questionnaires (Likert Scale) Not at all Slightly Moderately Quite a bit Greatly Censored Endpoints Time to the occurrence of a pre-defined event. The occurrence of the event may not observed for some patients. Then the time to the occurrence of the event for these subjects is censored 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
8
Introduction and Examples of Multivariate Data
Cross-sectional vs. longitudinal data Cross-sectional data Clinical data are collected and evaluated at a particular time point during the trial Longitudinal data (repeated measurements Clinical data collected and evaluated over a series of time points during the trial 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
9
Introduction and Examples of Multivariate Data
Dataset: Storm survival of sparrows After a sever storm on Feb. 1, 1898, 49 moribund sparrow were taken to Brown university. Professor Bumpus made morphological measurements and weight of each of 21 surviving female birds and 38 died female birds Relationship between survival and morphology of female birds: shorter, light, longer wing bones, longer legs, greater brain capacity natural selection ??? 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
10
Introduction and Examples of Multivariate Data
Dataset: Egyptian skulls Four measurements of male skulls early pre-dynastic period (4000 B.C.) late pre-dynastic period (3300 B.C.) the 12th and 13th dynasties (1850 B.C.) Ptolemaic period (200 B.C.) 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
11
Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
12
Introduction and Examples of Multivariate Data
Questions How are the four measurements related? Any statistical significance differences in the sample means, and if so, gradual changes over time in the shape and size of skulls Any statistical significance differences in the sample standard deviations, and if so, gradual changes over time in the shape and size of skulls in the amount of variation Is possible to construct a function of the four variables that describes the changes over time? 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
13
Introduction and Examples of Multivariate Data
Dataset: Distribution of a butterfly Uphydryas editha over 16 counties in California and Oregon Data Four environmental variables Altitude, annual precipitation, minimum and maximum temperature Six genetic variables Percent frequencies of different Pgi genes determined by electrophoresis 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
14
Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
15
Introduction and Examples of Multivariate Data
Questions: Are the Pgi frequencies similar for colonies that are close in space? To what extent, if any, are the Pgi fequencies related to environmental variables? 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
16
Introduction and Examples of Multivariate Data
Prehistoric Dogs from Thailand A collection of dog bones covering a period from about 3500 B.C. to present in excavations of prehistoric sites in northern Thailand Mandible (lower jaw) measurements to clarify ancestor of the prehistoric dog – golden jackal, Chinese wolf, Indian wolf, Indian wolf, cuon, dingo? 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
17
Introduction and Examples of Multivariate Data
Dataset: Employment in European Countries Percentages of the labor force in nine different types of industry for 30 European countries Nine types of industry: agriculture, mining and quarrying, manufacturing, power and water supplies, construction, finance, services, social and personal services, transport and communications 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
18
Introduction and Examples of Multivariate Data
Questions: Similar patterns of employments? Understanding the relationships between countries Differences between countries that related to political grouping: European Union, European Free Trade Area, Eastern European countries, and others 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
19
Introduction and Examples of Multivariate Data
Quality of Life in Treatment of Cancer Patients Adjuvant Breast Cancer Trial Eastern Cooperative Oncology Group (ECOG) compared a 16-week dose-intensive therapy to a conventional 24-week CAF therapy Dose-intensive therapy: increasing physical symptom and impact on psychosocial aspect of patient’s lives caused by inconvenience and fatigue. Health-related quality of life (HRQoL) of CAF is hypothesized to be superior to dose-intensive therapy 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
20
Introduction and Examples of Multivariate Data
Quality of Life in Treatment of Cancer Patients Advanced non-small Cell lung cancer (NSCLC) ECOG compared two new paclitaxel-cisplatin regimens with traditional etoposide-cisplatin regimen in treatment of stage IIIB-IV NSCLC To compare the HRQoL among three regimens and correlated quality of life to toxicity (Common Toxicity Criteria, CTC) Both HRQoL and CTC are multivariate in nature 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
21
Introduction and Examples of Multivariate Data
Functional Assessment of Cancer Therapy (FACT) -Lung (v 2) Items Not A little Some Quite Very at all bit what a bit much 34. I have been short of breath 35. I am losing weight 37. My thinking is clear 38. I have been bothered by hair loss 39. I have a good appetite 40. I feel tightness in my chess 41. Breathing is easy for me If you ever smoking answer 42 42. I regret my smoking 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
22
Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
23
Introduction and Examples of Multivariate Data
Four different teaching methods on reading proficiency and analytical ability of pupils in primary schools Student evaluations for the fall and spring semester over five academic years on Instructor teaching ability and Course overall quality from 1 (poor) to 5 (excellent) 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
24
Introduction and Examples of Multivariate Data
Course and Instructor Evaluation Courses Accounting, Decision Science, Finance, Management, and Marketing Instructors From 17 for accounting to 27 in Finance 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
25
Introduction and Examples of Multivariate Data
Course and Instructor Evaluation Evaluation Items (0-5) Ability to present course material Ability to stimulate student interest Knowledge of subject matter Apparent interest in students Objectivity and fairness Overall evaluation of the instructor Course contribution to technical or analytical skills Course contribution to understanding of complex phenomena Value of reading and assigned work Overall evaluation of the course 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
26
Introduction and Examples of Multivariate Data
Measurements of Iris species Species Iris setosa Iris versicolor Iris virginica Measurements Sepal length Sepal width Petal length Petal width 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
27
Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
Introduction and Examples of Multivariate Data Example: Microarray Chip The Roche AmliChip CYP 450 Test was the first DNA microarray diagnostic test to be approved by the US FDA on Dec. 23, 2004 to analyze one of the genes from a family of genes called cytochrome P450 genes which are active in the liver to metabolize drugs and other compounds such as grape fruit. 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
28
Introduction and Examples of Multivariate Data
Cytochrome (CYP) P450 is a family of genes in all living creatures CYP450 plays a primary role in metabolism CYP450 genes have been in existence for more than 3.5 billion years In humans, enzymes encoded by the CYP450 are found primarily in liver, where they metabolize drugs, toxins, and other foreign substances that enter the body 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
29
Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
30
Introduction and Examples of Multivariate Data
Phenotypes of CYP2D6 can be classified poor (no enzyme activity) intermediate (reduced enzyme activity) extensive (”normal” enzyme activity) Ultra-rapid (higher than normal enzyme activity) Phenotypes of CYP2C19 can be classified Poor (no enzyme activity) 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
31
Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
32
Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
33
Introduction and Examples of Multivariate Data
Unequal genetic variations (polymorphism) of CYP2C19 and CYP2D6 CYP2D6 has more than 80 distinct allelic variants Poor metabolizers of CYP2D6 Caucasians: 7% African-American: 2-4% Asian: 1-2% CYP2D6*10 allele (50% allele frequency) CYP2D6*17 and CYP2D6*29 alleles (~30%) 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
34
Introduction and Examples of Multivariate Data
From Carco Y. NEJM 2005 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
35
Introduction and Examples of Multivariate Data
CYP2D6 Gene duplications (ultra rapid metabolizers) Ethiopians: 29% Southern Europeans: 10% Northern European: 1-2% 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
36
Introduction and Examples of Multivariate Data
CYP2C19*2 and CYP2C19*3 alleles for poor metabolizer Null alleles are caused by a SNP that either causes a splice site or a stop codon Frequency Asian: 13-23% Caucasian: 3-5% African-American: 3-5% 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
37
Introduction and Examples of Multivariate Data
Poor or immediate metabolizer: toxic adverse events (side effects) even at lower dose Ultra rapid metabolizer: no efficacy even at higher dose Necessity of the microarray-based pharmacogenomic tests to provide genotyping of the CYP2D6 and CYP2C19 and predictive phenotype of associated enzyme activities Prevention of harmful drug interactions and assurance of the use of the optimal dose 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
38
Introduction and Examples of Multivariate Data
Characteristics of AmpliChip CYP450 Microarray 15,129 probes with ~107 of the specific oligonucleitide probe each Length of probe sequence: bases A single Probe Set consists of 4 Probes (or Features) which have a fixed target except for at the substitution position wherean A, C, G, and T are included to generate four unique probes: one Perfect Match (PM) and three Mismatch (MM) Data are many variables ( 10000) and very few cases (10) Source: Decision summary of k (US FDA, 2005) 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
39
Introduction and Examples of Multivariate Data
Characteristics of AmpliChip CYP450 Microarray A Probe Set Pair consists of a Wild-type Probe Set (for Wild-type allele) and a Mutant Probe Set (for a known polymorphism). To distinguish 29 polymorphisms in CYP2D6 including gene duplications and gene deletion To identify 27 distinct alleles including 7 CYP2D6 gene duplication genes Source: Decision summary of k (US FDA, 2005) 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
40
Introduction and Examples of Multivariate Data
Class Discovery Finding new classes (unsupervised) Class Comparison Identification of differentially expressed genes Class Prediction Selection of features and summary measure for prediction of a new member 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
41
Introduction and Examples of Multivariate Data
Class Discovery Multidimensional Scaling Cluster Analysis Factor Analysis Pattern Recognition Self Organizing Maps 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
42
Introduction and Examples of Multivariate Data
Class Comparison Unpaired t-test Analysis of Variance Permutation Tests Paired Comparisons Control of Experimentwise Error Rate and False Discovery Rate 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
43
Introduction and Examples of Multivariate Data
Class Prediction Discriminant Analysis Nearest Neighbor Classification Classification Tree and Logistic Regression Multinomial Regression Analysis Support Vector Machines 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
44
Introduction and Examples of Multivariate Data
Evaluation of Classification Error Rates Cross-validation Training set and test set Leave-one-out Method Jackknife Method Bootstrap Method 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
45
Introduction and Examples of Multivariate Data
All variables are continuous and in the same unit continuous and in the different unit All categorical data with the same scoring system All categorical data with different number of categories and different scoring system Combination of all mentioned above 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
46
Introduction and Examples of Multivariate Data
Preview of Multivariate Methods Summarization and comparison of mean characteristics Health Related Quality of Life questionnaires Identification of differentially expressed genes effectiveness in teaching methods Bumpus’s female sparrows Egyptian skulls 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
47
Introduction and Examples of Multivariate Data
Preview of Multivariate Methods Reduction of number of variables Principal components analysis Female sparrows The sum of 5 variables general size of the birds Y1 = X1 + X2 + X3 + X4 + X5 The difference between the sum of the first three variables and sum of the last two Y2 = X1 + X2 +X3 – X4 –X5 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
48
Introduction and Examples of Multivariate Data
Preview of Multivariate Methods Reduction of number of variables Factor Analysis – to account for the variation of original variables using a smaller number of index variables or factors Female sparrows – a two-factor model X1 = a11F1 + a12F2 + e1 X2 = a21F1 + a22F2 + e2 X3 = a31F1 + a32F2 + e4 X4 = a41F1 + a42F2 + e4 X5 = a51F1 + a52F2 + e5 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
49
Introduction and Examples of Multivariate Data
Preview of Multivariate Methods Reduction of number of variables Factor Analysis F1 and F2 are factors ei represents the variation in Xi independent of the variation in the other X variables aij are constant F1: size aij all positive F2: shape aij some + and some - 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
50
Introduction and Examples of Multivariate Data
Preview of Multivariate Methods Classification Discrimination analysis Classification of different groups on the available measurements Surviving vs. non-surviving sparrows Prediction of new specimens Supervised analysis The number of classes are pre-specified 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
51
Introduction and Examples of Multivariate Data
Preview of Multivariate Methods Classification Cluster analysis Identification of groups of similar characteristics The number of classes may or may not be pre-specified Unsupervised analysis Exploratory Grouping subjects based on their gene expression data 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
52
Introduction and Examples of Multivariate Data
Preview of Multivariate Methods Canonical correlations Variables are divided in groups Interest centers on the relationship between these groups The relationship between genetic variables and 4 environmental variables at the different colonies of a butterfly Uphydryas editha 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
53
Introduction and Examples of Multivariate Data
Preview of Multivariate Methods Multidimensional scaling Distances apart of a number of subjects Map on how the objects are related A map of relationship among prehistoric dogs with golden jackal, Chinese wolf, Indian wolf, Indian wolf, cuon, dingo A map of European countries based on their employment patterns 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
54
Introduction and Examples of Multivariate Data
The multivariate Normal Distribution Most of multivariate were derived under the assumption of multivariate normal distribution – mean vector and covariance matrix Transformation Computer Software SAS, SPSS, MINITAB, STATA, STATISTICA Graphic Methods A picture worth thousands of words 2 and 3-dimensional graphs 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
55
Introduction and Examples of Multivariate Data
Summary Examples Applications Datasets Preview of Methods Multivariate Normal Distribution Graphical Methods 2018/9/22 Copyright by Jen-pei Liu, PhD and Wei-chu Chie, MD, PhD
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.