Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multivariate Ranking, Prioritization, and Selection Using Partial Order for Comparative Knowledge Discovery in Multi-Indicator Information Fusion Systems.

Similar presentations


Presentation on theme: "Multivariate Ranking, Prioritization, and Selection Using Partial Order for Comparative Knowledge Discovery in Multi-Indicator Information Fusion Systems."— Presentation transcript:

1 Multivariate Ranking, Prioritization, and Selection Using Partial Order for Comparative Knowledge Discovery in Multi-Indicator Information Fusion Systems with Multi- Disciplinary Applications: Recent Past, Present, and Near Future Statistics seminar G P Patil

2 2 Federal Agency Partnership CDC DOD EPA NASA NIH NOAA USFS USGS Agency Databases Thematic Databases Other Databases Homeland Security Disaster Management Public Health Ecosystem Health Other Case Studies Statistical Processing: Hotspot Detection, Prioritization, etc. Data Sharing, Interoperable Middleware Standard or De Facto Data Model, Data Format, Data Access Arbitrary Data Model, Data Format, Data Access Application Specific De Facto Data/Information Standard Agency Databases Thematic Databases Other Databases Homeland Security Disaster Management Public Health Ecosystem Health Other Case Studies Statistical Processing: Hotspot Detection, Prioritization, etc. Data Sharing, Interoperable Middleware Standard or De Facto Data Model, Data Format, Data Access Arbitrary Data Model, Data Format, Data Access Application Specific De Facto Data/Information Standard SurvellanceGeoinformaticsof Hotspot Detection, Prioritization and Early Warning NSF Digital Government Project #0307010 PI: G. P. Patil gpp@stat.psu.edu Websites: http://www.stat.psu.edu/~gpp/ http://www.stat.psu.edu/hotspots/ http://www.stat.psu.edu/%7Egpp/DGOnlineNews2006.mht NSF Digital Government surveillance geoinformatics project, federal agency partnership and national applications for digital governance. Cellular Surface National and International Applications Biosurveillance Carbon Management Coastal Management Community Infrastructure Crop Surveillance Disaster Management Disease Surveillance Ecosystem Health Environmental Justice Environmental Management Environmental Policy Homeland Security Invasive Species Poverty Policy Public Health Public Health and Environment Robotic Networks Sensor Networks Social Networks Syndromic Surveillance Tsunami Inundation Urban Crime Water Management

3 3 We present a prioritization innovation. It lies in the ability for ranking and prioritization of objects and indicators based on intrinsic multiple indicator structural characteristics without having to integrate indicators into an index, using partial order sets and related novel concepts, methods, techniques, and tools. This leads us to early warning systems, and also to the selection of investigational entities. Prioritization Innovation Partial Order Set Ranking

4 Preliminaries Data Matrix : Multivariate Data Set Indicator Data Matrix: [x ij ]:n rows/objects: a 1 …a n : m columns/indicators: I 1 …I m.Objects may be entities, such as, individuals, units, pixels, areas, regions, patients, genes, drugs, documents, clients, products, tools with relevant characteristics as potential indicators for some single or multiple outcomes, endpoints, concepts, domains. m-dimensional data set consisting of n data points: no measurement column available on response variable y. To begin with, latent( abstract) concept for the objects with indicative indicator values/ measurements with common orientation.

5 As a simple example, consider size of an individual as the abstract concept.Consider height, weight, volume of the individual as indicators of size with assumed common orientation of positive monotonicity/ positive correlations.Generally speaking, larger the size, larger the indicator; larger the indicator, larger the size. The three indicators/ indicator measurements may have three-dimensional elliptical distributon with pairwise positive correlations.

6 The multivariate data set is usually a nonlinear partially ordered set. Not all pairs of objects are comparable. For a two indicator set up: Figures: Ranking usually amounts to linearizing the poset by ranking the objects with appropriate scalar rank-scores consistent with the comparability in the data matrix. Rank-scores need to inherit the comparabilities in the data set. Incomparable pairs are expected to become comparable in either direction.

7 On which line is the linearized set to lie? Without loss of generality, on which axis passing thru the origin? In which manner of separations between successive objects?Projections on a ray thru the origin have been popular.The ray is determined by w= ( w 1,…,w m ), where w j >0,with summation of w j being unity, a differential weight vector, measuring relative importance of indicators for the abstract concept.Projection is a fixed scalar multiple of what is popularly called weighted composite index with weight vector w.

8 Choice of w involves subjective trade off/ compensation among indicators.It becomes a sensitive issue between stakeholders.Reconciliation in view of data matrix evidence becomes a practical challenge and scientific/ statistical opportunity. Can we think of a data based w intrinsic to the data matrix?And relative to such a w, and its corresponding ray, can we think of alternative ways of computing appropriate rank-scores, which do not involve indicator trade offs? And if we can think of several methods of rank-scores and resultant rankings, is it possible to measure their individual performance to help find a best method among them for the given data set? Interestingly, all of these questions are frontier questions that we should wish to address. And fortunately, we now have some initial answers that we wish to share on the challenging issues of multivariate ranking over the past several decades.

9 Intrinsic Differential Weight Vector w I for the Data Matrix based Indicator Set, Measuring Relative Importance of Indcators. Method1: L 0 -distance: Pairwise Object Comparisons, and Indicator Agreements among Object Comparison Disagreements. Method2: L 1 -distance: Pairwise Indicator Ranking Comparisons. Method3: L 2 -distance: Pairwise Indicator Ranking Comparisons.

10 Method1: Consider Multivariate Zeta Matrix: nxn. Object x Object Comparability Matrix. Cell Entry: m-variate bit, binary digit:111…,000…, 101100…01, where 1 if a i > a j, and 0, otherwise. Comparability cell has all 1’s, or, all 0’s in its bit, indicating collective agreement among indicators. Incomparability cell has some 1’s and some 0’s in its bit, indicating collective disagreement among indicators. For each incomparability cell, count for each indicator the number of agreements with the collectivity of indicators. Add up for each indicator over all of the incomparability cells. Normalize/ unitize to give the intrinsic w I we are looking for. Incidentally, and importantly, this intrinsic w I also provides a powerful basis for comparison and selection of indicators. Method2 and Method3: Will come back, if time permits.

11 Conceptualizing and Computing Performance Measure of a Comparability Invariant Partial Order Ranking Method: Consider Multivariate Zeta Matrix as before: But, this time, Cell entry: ( m+1 )-variate bit with the first m variates as bebefore, and the ( m+1 )-th variate corresponding to the Ranking. For each incomparability cell, count for each indicator the agreement with the Ranking.Add up for each indicator over all the incomparability cells. Normalize/ unitize to give the w R induced by the Ranking R. Define its performance measure PMR by corr/ gen. corr ( w I, w R ).

12 Some Comparability Invariant/ Partial Order based Ranking Methods: Method1: Weighted Composite Index for Rank- score: WCI. Method2: Comparability Weighted Net Superiority Index for Rank-score: CWNSI. Method3: MCMC based Weighted indicator Average Rank for Rank-score: WIARI. Method4: MCMC based Weighted indicator Cumulative Rank Frequency Distribution for Stochastic Rank-score: WICRFDI.

13 13 Method1: Weighted Composite Index for Rank-score Existence of intrinsic differential weight vector w and the correspondingly weighted composite index w.x = | w | | x | cos( w, x ) = | w | x projection of x on w. w.d = 0, w.d > 0, w.d < 0 where d = x 1 – x 2

14 An illustration with two indicator space

15 Method2: Comparability Weighted Net Superiority Index for Rank-score Rank-score ( x )=(O(x)- F(x))(O(x)+ F(x) )/ (n-1) = Net Superiority x Comparability Figure:

16 Method3: MCMC based Weighted Indicator Average Rank for Rank-score

17

18

19 Method4: MCMC based Weighted Indicator Cumulative Rank Frequency Distribution for Stochastic Rank-score

20

21

22

23

24

25

26 Multivariate NonParametrics with Partial Order: Multivariate Ranking With Multivariate Data Set as Data Matrix Data Matrix: [x ij ], n x m, Columns as variables, and not known necessarily as indicators with common orientation.

27 Consider 2 m transforms of the Data Matrix with columns retained or reversed. Transform ID given by m- dimensional bit, 0011011001…01, 0 to mean retain, and 1 to mean reverse. Multivariate Median: For each transform, compute CWNSI to provide a triplet of its median object and an object immediately above and below in rank.

28 Consider the object frequency distribution over the 3 x 2 m objects thus centrally discovered. Declare the modal object of this frequency distribution to be the multivariate median estimate we are looking for. It is possible to have several maximal modes, in which case, their centroid may be declared as the estimate.

29 Alternatively,allocate minimum rank to each object from within its 2 m CWNSI rank values from the 2 m transforms. Call this minimum rank its data depth. Maximum data depth will then yield the multivariate median estimate.It is possible to have several objects with maximum data depth. The centroid will play the role.We conjecture approximate affine invariance

30 Multivariate Order Statistics relative to the Multivariate Median Construct n x m data matrix of co- ordinates-wise/ columns-wise separation of each object from the estimated multivariate median.Consider the 2 m transforms, yielding 2 m rank values for each object.

31 Choose now the maximum rank, and call it the outlyingness measure of the object, giving the rank-score for the rank for it as a multivariate order statistic. Appropriately weighted linear combinations of these multivariate order statistics will help improve and sharpen the multivariate median. Iterative moving windows on grids will help with image enhancement, and spatial data smoothing.

32 Some Applications: Genome Wide Association Studies: Knut Wittkowski Eli Lilly Debashis Ghosh Human Environment InterFace Ashbindu Singh Myers and Patil Bruggemann and Patil Several Other Applications Four Current Monographs Here


Download ppt "Multivariate Ranking, Prioritization, and Selection Using Partial Order for Comparative Knowledge Discovery in Multi-Indicator Information Fusion Systems."

Similar presentations


Ads by Google