Presentation on theme: "Controlling the two kinds of error rate in selecting an appropriate asymmetric MDS model Naohito Chino & Shingo Saburi Aichi Gakuin University, Japan March."— Presentation transcript:
Controlling the two kinds of error rate in selecting an appropriate asymmetric MDS model Naohito Chino & Shingo Saburi Aichi Gakuin University, Japan March 26 The 6th International Conference on Multiple Comparison Procedures, Tokyo, Japan, March 24-27, 2009
Organization of my talk (1) What is “asymmetric MDS”? (2) extant asymmetric MDS models (3) What is “ASYMMAXSCAL”? (4) advantages of ASYMMAXSCAL (5) shortfalls of ASYMMAXSCAL (6) our proposals to overcome the defects
(1) What is “asymmetric MDS”? The asymmetric MDS is a method which is specifically designed to analyze asymmetric relationships among members and display them graphically by plotting each member in a certain dimensional space, given asymmetric data.
Examples of symmetric and asym- metric relational data Symmetric data flight mileages among 10 cities => We can recover the map of these cities. Asymmetric data degrees of sentiment relationships among 17 members measured by a 7-point rating scale => We can estimate the configuration of members in a certain dimensional space.
extant asymmetric MDS models Chino (1978, 1990), Chino and Shiraiwa (1993), Constantine and Gower (1978), Escoufier and Grorud (1980), Gower (1977), Harshman (1978), Harshman et al. (1982), Kiers and Takane (1994), Krumhansl (1978), Okada and Imaizumi (1987, 1997), Rocci and Bove (2002), Saburi and Chino (2008), Saito (1991), Saito and Takeda(1990), Sato (1988), ten Berge (1997), Tobler (1976-77), Trendafilov (2002),Weeks and Bentler (1982), Young (1975), and Zielman and Heiser (1996).
Examples of the extant models O-I model (Okada & Imaizumi, 1987) -- an augmented distance model GIPSCAL (Chino, 1990) -- a non-distance model
What is “ASYMMAXSCAL” Although various asymmetric MDS methods have been proposed, these methods have remained to be descriptive until recently. By contrast, Saburi & Chino (2008) have proposed a maximum likelihood method for asymmetric MDS called ASYMMAXSCAL, which extends MAXSCAL by Takane (1981) to asymmetric relatio- nal data. MAXSCAL is the name for a multidimensional successive categories scaling.
ASYMMAXSCAL revisited (1) Parameters in ASYMMAXSCAL As with MAXSCAL by Takane (1981), it has three kinds of parameters pertaining to (1) the representation model (2) the error model (3) the response model
ASYMMAXSCAL revisited (2) As for the representation model, the proximity model of O i to O j, say, g ij, can generally be written as where f() is any asymmetric MDS model, x i and x j, respectively, are coordinate vectors of O i and O j, and c is the remain- ing parameter vector.
ASYMMAXSCAL revisited (3) As regards the error model, the error- perturbed proximities are written as Error-perturbed proximities
ASYMMAXSCAL revisited (4) As for the response model, we assume that subjects place error-perturbed pro- ximities in one of the M rating scale categories, C 1, …, C M. Thus, these categories are represented by a set of ordered intervals with upper and lower boundaries on a psychological contin- uum. boundaries
ASYMMAXSCAL revisited (5) Accordingly, the probability that the error-perturbed proximity of O i to O j falls in C m is given by We assume that
ASYMMAXSCAL revisited (6) based on Torgerson’s law of categorical judgment. Here, Ф(τ ij ) denotes the density of the standard normal distribution. (For computational convenience, we app- roximate it by the logistic distribution.)
ASYMMAXSCAL revisited (7) We estimate all the parameters pertaining to ASYMMAXSCAL by maximizing the following joint likelihood of the total observations where Y ijm denotes the frequency in category C m, in which subjects placed the error-perturbed proximity of O i to O j.
As for the details of ASYMMAXSCAL, see Saburi, S. and Chino, N. (2008). A maximum likelihood method for an asymmetric MDS model. Computational Statistics and Data Analysis, 52, 4673-4684.
Advantages of ASYMMAXSCAL over the extant descriptive models (1) (1) Determination of the appropriate scal- ing level of the data by AIC, that is, ordinal, interval, or ratio level. (2) Determination of the appropriate dimensionality of the model under study by AIC.
Advantages of ASYMMAXSCAL over the extant descriptive models (2) (3) Examination that the data are sufficiently asymmetric or not, i) prior to the scaling of objects, by applying some tests for symmetry, ii) on the way to the scaling, by selecting a model among several candidates including some symmetry models, using AIC.
Tests for symmetry, prior to the scaling of objects The data obtained by the above method is as follows. We call it the Type A design data, or the Type A data. One of them is the test for a special con- ditional symmetry hypothesis for this design data..
Type A (design) data in ASYMMAXSCAL C1C1 C2C2 … CMCM from O 1 to O 1 from O 1 to O 2 … from O n to O n rating categories proximity judgments frequencies n 11 Number of samples n 12 … n nn
Type B (design) data in ASMMAXSCAL Type B (design) data is obtained by rearranging the Type A data per rat- ing category. (It is a bit different from traditional designs for the n×n×M table (Ag- resti, 2002, Bishop et al., 1975)). Y 111 … Y 1n1 ۬.۬.. Y n11 …. Y nn1. Y 11M … Y 1nM Y n1M.. … Y nnM.. C1C1 C2C2 CMCM ∶ ∶ ∶ n 11 n n1 n nn ….. total The Type B design data
Tests for symmetry in ASYMMAXSCAL (2) According to Saburi and Chino (2008), under the null hypothesis, the likelihood ratio test statistic, asymptotically follows the central χ 2 -distribution with (M-1)n(n-1)/2 degrees of freedom.
Tests for symmetry in ASYMMAXSCAL (4) with Mn(n-1)/2 degrees of freedom under the null hypothesis. By the way, the traditional conditional symmetry test with the special n×n×M contingency table, of which test statistic is given by
shortfalls of ASYMMAXSCAL (1) (1) Shortfall of the model selection method At present, ASYMMAXSCAL enables us to select the most appropriate model among several candidate models which include some variants of symmetry models using AIC. However, such a model selection method by some information criterion does not consider the nature of the data. It will be necessary to select the representation model which reflects the nature of the data most.
shortfalls of ASYMMAXSCAL (2) To do this job, it might be necessary to utilize various smmetry related tests which have been developed in the branch of mathematical statistics. Chino and Saburi (2006) attempted to administer these tests prior to the scaling step of ASYMMAXS- CAL. Figures 1 and 2 show this. However, relation of inclusion of these tests shown in Figure 1 is very complicated. => Chino, N. & Saburi, S. (2006). Tests of symmetry in asymmetric MDS. Paper presented at the 2 nd German Japanese Symposium on Classification, Berlin, Germany.
shortfalls of ASYMMAXSCAL (3) (2) Lack of taking overall statistical errors into account In performing such sequential tests shown in Figure 2, we did not take overall statistical errors into account. It will be interesting and useful to examine whether these tests are mutually statistically independent or not. For simplicity, we shall, at present, exclusively consider a two-dimensional square contingency table.
We have recently conjectured that, at least, two of these tests are statistically independent. These are (1) test of the quasi-symmetry hypothesis, and (2) test of the symmetry hypothesis under the condition that the quasi-symmetry hypo- thesis holds Our proposal to overcome the shortfalls
Parameter spaces for these tests (1) (1) total parameter space of the log-linear model (2) parameter space for the quasi-symmetry hypothesis
Parameter spaces for these tests (2) (3) parameter space for the symmetry hypothesis (4) parameter space for the equality of the row and column effects
Null and nonnull hypotheses of these tests (1) the quasi-symmetry hypothesis, (2) the symmetry hypothesis under
Relation of inclusion of the two hypotheses Ω=ω0Ω=ω0 ω QS ωSωS ω ERC Θ ij (12) =Θ ji (12) Θ i (1) =Θ i (2) It should be noticed that
Dissolution of a likelihood ratio statistic (1) The likelihood ratio λ S for testing the usual sym- metry hypothesis can be written, for example, as Therefore, we have or
Dissolution of a likelihood ratio statistic (2) This statistic is due to Caussinus (1965), and follows asymptotically χ 2 distribution with r-1 degrees of freedom.
Test statistic for the quasi-symmetry hypothesis It is well known that under the hypothesis, follows asymptotically the χ 2 distribution with (r-1)(r-2)/2 degrees of freedom. Here, satisfies the following equations:
Joint sufficient statistic for the nuisance parameters The statistics corresponding to the nuisance parameters under the quasi-symmetry hypothesis, give their joint sufficient statistics.
Ancillary statistic for these nuisance parameters Moreover, under the hypothesis of quasi- symmetry, the statistic,, is free of these nuisance parameters, because the term,, are estimated as functions of the data,. In other words, the statistic,, is ancillary for these nuisance parameters.
Independence of the two statistics As a result, due to the theorem by Hogg & Craig (1956), is statistically independent of Hogg, R. V. & Craig, A. T. (1956). Sufficient sta- tistics in elementary distribution theory. Sankhyā, 17, 209-216.
Control of the error rates of the two kinds Since the two statistics are mutually stochastically independent, we may set the error rate of each of the test statistics to 1-(1-α) 1/2 if we want to control the error rate of the first kind at α. Furthermore, we can construct a more powerful test than Dunn’s and Holm’s, if we set the error rate of the quasi-symmetry test to α and set that of the symmetry test under the quasi-symmetry hypothesis to α/2, according to Hochberg (1988).
Order of testing two hypotheses and selec- tion of MDS models Quasi- symmetry ？ Multiple CP Non-QS-family of MDS models rejected Symmetry Under QS? accepted Multiple CP Symmetric MDS QS-family of MDS models accepted rejected