Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Department of Economic and Management Sciences Prof. Andy Mauromoustakos Prof. Andy Mauromoustakos Analysis of multivariate Data using JMP “Heavy metals.

Similar presentations


Presentation on theme: "The Department of Economic and Management Sciences Prof. Andy Mauromoustakos Prof. Andy Mauromoustakos Analysis of multivariate Data using JMP “Heavy metals."— Presentation transcript:

1 The Department of Economic and Management Sciences Prof. Andy Mauromoustakos Prof. Andy Mauromoustakos Analysis of multivariate Data using JMP “Heavy metals in marine sediments, differences between locations “ Presented by: Tareq Altamimi econ0306

2 Slides Index  Presentation overview. Presentation overview. Presentation overview.  Why MANOVA. Why MANOVA. Why MANOVA.  JMP Analysis. JMP Analysis. JMP Analysis.  Database Variables definition. Database Variables definition. Database Variables definition.  Database Overview & Objectives. Database Overview & Objectives. Database Overview & Objectives.  MANOVA Tests. MANOVA Tests. MANOVA Tests.  Correlation between Variables. Correlation between Variables. Correlation between Variables.  MANOVA Analysis. MANOVA Analysis. MANOVA Analysis.  Repeated measures Response. Repeated measures Response. Repeated measures Response.  Analysis Results. Analysis Results. Analysis Results.  Conclusion. Conclusion. Conclusion.  References. References. References. Click INDEX to return back here or Enter to go to the next Slide  Principle components. Principle components. Principle components.

3 Presentation Overview: This presentation is a description of an important statistical subject which is “multivariate analysis” using JMP program. The analysis in this presentation is done for a database collected by scientists in the University of Melbourne at the department of biology. The analysis is mainly about heavy metals in marine sediments, differences between llocations. We take three different locations with four observations(experiments) each. JMP make it easy to see the results of this changes in the heavy metals in marine sediments in an easy graphically presented data. This graphics is called charts, these chars can be translated easily by statisticians. This presentation also will include a translation of the results of analysis in three different locations which this database was tested on. JMP make it easy to see the results of this changes in the heavy metals in marine sediments in an easy graphically presented data. This graphics is called charts, these chars can be translated easily by statisticians. This presentation also will include a translation of the results of analysis in three different locations which this database was tested on. The Analysis in this project is done on multivariate analysis on variance (MANOVA) which is used to see the main and interaction effects of categorical variables on multiple dependent interval variables. MANOVA uses one or more categorical independents as predictors. MANOVA tests the differences in the centred (vector) of means of the multiple interval dependents, for various categories of the independent(s). There are multiple potential purposes for MANOVA. So all of the analysis here will be done by MANOVA using JMP Program. The Analysis in this project is done on multivariate analysis on variance (MANOVA) which is used to see the main and interaction effects of categorical variables on multiple dependent interval variables. MANOVA uses one or more categorical independents as predictors. MANOVA tests the differences in the centred (vector) of means of the multiple interval dependents, for various categories of the independent(s). There are multiple potential purposes for MANOVA. So all of the analysis here will be done by MANOVA using JMP Program. INDEX

4 JMP dynamically links statistics and graphics so you can easily explore data, make discoveries, and gain the knowledge you need to make better decisions. Click on a point in a graph to highlight the corresponding observation everywhere it is represented in JMP: in other graphs, in 3-D spinning plots, and in the data tables. JMP provides a comprehensive set of statistical tools as well as Design of Experiments (DOE) and advanced quality control (QC and SPC) tools for Six Sigma in a single package. Advanced modelling techniques include ANOVA and MANOVA, stepwise, log linear, ordinal logistic regression, survival/reliability, true non-linear modelling, partitioning (decision trees), neural networks, time series; multivariate, cluster, discriminant, and partial least squares (PLS). The JMP Scripting Language (JSL) lets you capture the results of your work in automatically-generated scripts, and offers all the power of a programming language, complete with matrix algebra support, so you can create custom analyses, interactive graphics, and more. Multivariate statistics were developed to handle situations where multiple variables or measures are involved. Any analysis of more than two variables or measures can loosely be considered a multivariate statistical analysis or multivariate analysis. One of the primary goals of multivariate statistical analysis is to describe the relationships among a set of variables. The multivariate analysis is widely used in various fields, such as agriculture, food and life sciences, business and engineering and so on. Repeated measures analysis (also called longitudinal data” when repeated measurements are taken on each subject and you want to analyse effects both between subjects and within subjects across the measurements. This multivariate approach is especially important when the correlation structure across the measurements is arbitrary. Why MANOVA, why Repeated Measures? Why JMP? INDEX

5 The JMP way of doing things is best summarised in the following four points Variables are assigned to one of the three levels of measurement: nominal, ordinal or continuous. This assignment is under user control and may be changed at will. It is used to allow JMP to decide what summary statistics to provide and what techniques are suitable for analysis using the variable. In any analysis, each variable may be assigned one of the roles: X, Y, weight, frequency or label. Using a combination of the above two pieces of information, JMP is able to decide on an appropriate analysis if the user chooses one of the following activities, referred to as 'personalities': Distribution of Y for single variable summaries and plots. Fit Y by X for one response and one explanatory variable, the techniques employed are ANOVA, LS regression, logistic regression or contingency table analysis depending on the levels of the X and Y variables. Fit Model for variable numbers of responses and explanatory variables, under this heading a range of techniques are available including: ANOVA, ANCOVA, MANOVA, LS regression and stepwise procedures, logistic and ordinal regression, log-linear models, proportional hazard models, screening models and D-optimal designs. Non-linear Fit for non-linear models specified by the user and fit using Gauss-Newton or Newton-Raphson and with either one of a range of built in loss functions or one specified by the user. Correlation of Ys for examination of the correlation or covariance structure of a set of variables, including scatter plot matrices, PCA and factor analysis. Cluster for cluster analysis using hierarchical and K- means approaches. Survival for survival analysis using Kaplan-Meier, Cox regression and non-linear survival models. Most of the analysis 'personalities' produce graphs, many of them dynamic, as part of their standard output. There are always a range of additional outputs both textual and graphical available from analyses. INDEX

6 MANOVA Tests: MANOVA has four main tests described as following: Hotelling's T-Square is the most common, traditional test where there are two groups formed by the independent variables. Note one may see the related statistic, Hotelling's Trace (a.k.a. Lawley-Hotelling or Hotelling-Lawley Trace). To convert from the Trace coefficient to the T-Square coefficient, multiply the Trace coefficient by (N-g), where N is the sample size across all groups and g is the number of groups. The T-Square result will still have the same F value, degrees of freedom, and significance level as the Trace coefficient. Wilks' lambda, U. This is the most common, traditional test where there are more than two groups formed by the independent variables. It is a measure of the difference between groups of the centroid (vector) of means on the independent variables. The smaller the lambda, the greater the differences. The Bartlett's V transformation of lambda is then used to compute the significance of lambda. Wilks's lambda is used, in conjunction with Bartlett's V, as a multivariate significance test of mean differences in MANOVA, for the case of multiple interval dependents and multiple (>2) groups formed by the independent(s). The t-test, Hotelling's T, and the F test are special cases of Wilks's lambda. Pillai-Bartlett trace, V. Multiple discriminant analysis (MDA) is the part of MANOVA where canonical roots are calculated. Each significant root is a dimension on which the vector of group means is differentiated. The Pillai- Bartlett trace is the sum of explained variances on the discriminant variates, which are the variables which are computed based on the canonical coefficients for a given root. Olson (1976) found V to be the most robust of the four tests and is sometimes preferred for this reason. Roy's greatest characteristic root (GCR) is similar to the Pillai-Bartlett trace but is based only on the first (and hence most important) root. Specifically, let lambda be the largest eigen value, then GCR = lambda/(1 + lambda). There are multiple potential purposes for MANOVA There are multiple potential purposes for MANOVA. To compare groups formed by categorical independent variables on group differences in a set of interval dependent variables. To use lack of difference for a set of dependent variables as a criterion for reducing a set of independent variables to a smaller, more easily modeled number of variables. To identify the independent variables which differentiate a set of dependent variables the most. INDEX

7 One objective for this particular study was to determine if there is any differences of the percentage of heavy metals in marine sediments on different locations. Treatment Design: The treatments included the quantities of cooper(CU), lead(PB), nickel(NI) and manganese(MN). In every location we get 4 different samples and we made the experiment on it. We did this in the four location which are Delray, Seaspray and Woodside. Experiment Design: Samples of marine sediments were randomly assigned to the four treatments in a completely randomized design. The treated marine sediments samples were placed in airtight containers and incubated under conditions conducive to microbial activity. This experiment was done to discover it there is differences in the percentage of heavy metals in marine sediments if the sample is taken from different locations?, is there any effect of the location in these percentages? These differences can be measured by measuring the three main heavy metals in it. The heavy measures were measured in three different locations. In each location we have four different samples to make sure that we have reached to a general conclusion about this kind of amazing marine sediments. The heavy metal quantity in each marine sediment sample was recorded on an idealized experiment area. The data is already shown in the previous slides and a profile plot from the Fit Model’s MANOVA personality is shown on the right. Heavy metals in marine sediments, differences between locations :- Research objectives: The University of Melbourne is doing a research on Marine sediments. The main aim is to discuss the heavy metals in this marine sediments if the location is changed, is there any changes in the levels of these sediments?. The answer of this problem will be discuss in this database study on this presentation. This analysis will be done using JMP software. INDEX

8 Database Variables Definition: * Site: the first Variable in this database is the site which is the place where the experiment had done. In this database we have three different sites, which is (Delray beach, Sea spray, Woodside). This Variable is important because it divides the data in three different group depending on the area the experiment had been done.The following variable are continuous as CU,PB,NI and MN are dependant variables, this is considered as one of the main conditions of MANOVA. Factor Variable: Site: It is having values of the area where the marine sediments had already taken from.Responses: CU: is the concentration of Copper in marine sediment, on every site we have four different tests. PB: This Variable represent is the concentration of lead in Marine sediment. NI: It is the concentration of nickel in marine sediment. MN: It is the concentration of manganese in marine Sediment. This last four Variables are considered the most important variables because it will show to us the differences or the similarity of the marine sediments in three different sites. Some other Variable used in this database is the log10 transformation of the previous mentioned variables, in addition to the log10 transformation of FE: * LCU - log10 transformation of CU * LPB - log10 transformation of PB * LNI - log10 transformation of NI * LMN - log10 transformation of MN * LFE –log10 transformation of FE This data which we have is eligible to the MANOVA conditions as we have continuous dependant variable and categorical independent variable. NEXT

9 * SITE - sites from which data were collected (Delray Beach, Seaspray,Woodside) * CU - concentration of copper * PB - concentration of lead * NI - concentration of nickel * MN - concentration of manganese * LCU - log10 transformation of CU * LPB - log10 transformation of PB * LNI - log10 transformation of NI * LMN - log10 transformation of MN *LFE –log10 transformation of FE Woodside Sea spray Delray Marine sediments Database of “heavy metals in marine sediments, differences between locations. NEXT

10 The database in JMP program This picture showing us how the data look like at the JMP program during the analysis. To know what does this mean look at the previous page in the previous page My models (scripts) INDEX

11 Correlation between variables The table behind is called the Correlations table, which is a matrix of correlation coefficients that summarizes the strength of the linear relationships between the each response which are the heavy metal variables (CU,PB,NI, MN, LCU,LFE,LNI,LMN). The scatter plot matrix on the left showing that there is a relation between the different items in the data so it shows that the data is correlated, this correlation is not a high correlation but it is correlated. If we get the LFE variable with the LNI it shows a strong relationship between both in the plot. INDEX

12 Description of how some analysis work in JMP. In MANOVA analysis you can select a response design that indicates whether you want to use the response variables individually or in some linear combination. JMP like most software supports several response designs, but it also allows you to “build your own”. Included designs: Rep. Measures A utomatic analysis of repeated measures design. This is the way used in this database analysis. Sum the sum of the responses, one value Identity each response, the identity matrix (no transformation M=I) Contrast each response (except the first) minus the first Polynomial orthogonal polynomials Helmert each response versus the ones after it, except the last Profile each response versus all others, except the last Mean each response versus the mean of the others, except the last Compound for response forming a compound of more than one effect Custom any M matrix you want to enter a d edit yourself. INDEX

13 The Spinning Plot platform displays a three- dimensional spinnable plot. This third column shows the cumulative percent of variation represented by the eigenvalues. The first three principal components account for 93.6277% of the variation in the sample. INDEX

14 Analysis with MANOVA Least Squares Means Report: This graph give us for each pure nominal effect, the overall least squares means of all the heavy metals and their log transformations and profile plots of the means. Shows the profile plot of the metal and their transformations logs and the table of least squares means. The second graph is showing to us every site mean so it dividing to us the over all means depending on the location where every sample is gotten from. It also include the table of least square means organized by sites (locations). From here we can see that the least square mean of the Woodside area is more than the Seapray and the Seapray results is also more than Delray. This also can be noticed from the table of least squares of every variable described under this graph. NEXT

15 The Partial Correlation table here shows the covariance matrix and the partial correlation matrix of residuals from the initial fit. The partial correlation table shows the partial correlations of each pair of variables after adjusting for all the other variables so we can notice how its designed to make relations between every heavy metal here. Notice that the diagonal is 1 always in the partial correlation. Partial covariance and correlation tables: The main ingredients of multivariate tests are the E and the H matrices: The elements of the E matrix are the cross products of the residuals. “E & H meaning click here click here The H matrices correspond to hypothesis sums of squares and cross products. There is an H matrix for the whole model and for each effect in the model. Diagonal elements of the E and H matrices correspond to the hypothesis (numerator) and error (denominator) sum of squares for the univariate F tests. New E and H matrices for any given response design are formed from these initial matrices, and the multivariate test statistics are computed from them. INDEX

16 The MANOVA Analysis The result of this test improve that the level of heavy metals is varied whenever we change the location. The MANOVA Analysis In this MANOVA analysis we choose to use the Repeated measures response because the data has several observations in every site we have. If we look to the F-test we can notice that its intercept test prob>f = 0.0001 which is less than 0.05(α). From the main principals of the Multivariate tests we will notice that in Roy's max root test prob>f is 0.0353 which is less than 0.05. From here we can understand that there is a difference in the level of heavy metals in marine sediments when we change the location where the sample is gotten from. The result of this test improve that the level of heavy metals is varied whenever we change the location. INDEX

17 Conclusion: After applying a statistics analysis to the data of marine sediments in three different locations (Woodside,Sea spray,Delray) we discover that the heavy metals levels in the marine sediments depend rationally on the area where these marine sediments is located.After applying a statistics analysis to the data of marine sediments in three different locations (Woodside,Sea spray,Delray) we discover that the heavy metals levels in the marine sediments depend rationally on the area where these marine sediments is located. INDEX

18 Experimental Design and Data Analysis for Biologists Gerry Quinn & Mick Keough Chapter 16: Multivariate analysis of variance and discriminant analysis Published by Cambridge University Press 2002. * The database is a study of “University of Melbourne”. http://www.jmp.com/ http://www.jmp.com/ & other alternative website. PowerPoint presentation on “Marketing Research Part B Continuous Data Applications Multivariate Analysis” and other PDF and printed material. Dr. Andy Mauromoustakos JMP version 5 help. Home INDEX


Download ppt "The Department of Economic and Management Sciences Prof. Andy Mauromoustakos Prof. Andy Mauromoustakos Analysis of multivariate Data using JMP “Heavy metals."

Similar presentations


Ads by Google