1 Multidimensional Detective Alfred Inselberg, Multidimensional Graphs Ltd Tel Aviv University, Israel Presented by Yimeng Dou 04-24-2002

Slides:



Advertisements
Similar presentations
Learning deformable models Yali Amit, University of Chicago Alain Trouvé, CMLA Cachan.
Advertisements

Example 2.2 Estimating the Relationship between Price and Demand.
Multidimensional Detective Alfred Inselberg Presented By Rajiv Gandhi and Girish Kumar.
Parallel Coordinates Representation of multi-dimensional data Discovery Process xmdv Visualization Tool Ganesh K. Panchanathan Christa M. Chewar.
Analyzing Multivariable Change: Optimization
Support Vector Machines
LIAL HORNSBY SCHNEIDER
Decision Making: An Introduction 1. 2 Decision Making Decision Making is a process of choosing among two or more alternative courses of action for the.
Polaris: A System for Query, Analysis and Visualization of Multi-dimensional Relational Databases Presented by Darren Gates for ICS 280.
SASH Spatial Approximation Sample Hierarchy
© Janice Regan, CMPT 102, Sept CMPT 102 Introduction to Scientific Computer Programming The software development method algorithms.
ENV 2006 CS3.1 Envisioning Information: Case Study 3 Data Exploration with Parallel Coordinates.
Copyright © Cengage Learning. All rights reserved.
Visualization of Multidimensional Multivariate Large Dataset Presented by: Zhijian Pan University of Maryland.
Automatic Face Recognition Using Color Based Segmentation and Intelligent Energy Detection Michael Padilla and Zihong Fan Group 16 EE368, Spring
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
1 i247: Information Visualization and Presentation Marti Hearst Interactive Multidimensional Visualization.
Multidimensional Detective Alfred Inselberg Presented By Cassie Thomas.
Constrained Optimization
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Mutlidimensional Detective Alfred Inselberg Streeable, Progressive, Mutlidimensional Scaling Matt Williams, Tamara Munzner Rylan Cottrell.
Linear-Programming Applications
Breakeven Analysis for Profit Planning
Introduction to Optimization (Part 1)
Stevenson and Ozgur First Edition Introduction to Management Science with Spreadsheets McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies,
An Introduction to Support Vector Machines Martin Law.
Managerial Economics Managerial Economics = economic theory + mathematical eco + statistical analysis.
Application of CAS to geodesy: a ‘live’ approach P. Zaletnyik 1, B. Paláncz 2, J.L. Awange 3, E.W. Grafarend 4 1,2 Budapest University of Technology and.
Curve Modeling Bézier Curves
Manifold learning: Locally Linear Embedding Jieping Ye Department of Computer Science and Engineering Arizona State University
Production  Production is the process of transformation of one or more inputs into one or more outputs.  Production is defined as the creation of utilities.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
What is a Business Analyst? A Business Analyst is someone who works as a liaison among stakeholders in order to elicit, analyze, communicate and validate.
 Frequency Distribution is a statistical technique to explore the underlying patterns of raw data.  Preparing frequency distribution tables, we can.
Managerial Decision Making and Problem Solving
1 ECE-517 Reinforcement Learning in Artificial Intelligence Lecture 7: Finite Horizon MDPs, Dynamic Programming Dr. Itamar Arel College of Engineering.
Opinion to ponder… “ Since we are a visual species (especially the American culture), because of our educational system. Many of the tools currently used.
Introduction to Software Testing. Types of Software Testing Unit Testing Strategies – Equivalence Class Testing – Boundary Value Testing – Output Testing.
An Introduction to Support Vector Machines (M. Law)
Summer Student Program 15 August 2007 Cluster visualization using parallel coordinates representation Bastien Dalla Piazza Supervisor: Olivier Couet.
Chapter 15 To accompany Helping Children Learn Math Cdn Ed, Reys et al. ©2010 John Wiley & Sons Canada Ltd.
Copyright © Cengage Learning. All rights reserved. Chi-Square and F Distributions 10.
Multi-objective Optimization
Introduction to Optimization
1 PowerPointPresentation by PowerPoint Presentation by Gail B. Wright Professor Emeritus of Accounting Bryant University © Copyright 2007 Thomson South-Western,
Machine Vision Edge Detection Techniques ENT 273 Lecture 6 Hema C.R.
Introduction to Scale Space and Deep Structure. Importance of Scale Painting by Dali Objects exist at certain ranges of scale. It is not known a priory.
© 2009 Prentice-Hall, Inc. 7 – 1 Decision Science Chapter 3 Linear Programming: Maximization and Minimization.
1 Visualizing Multi-dimensional Clusters, Trends, and Outliers using Star Coordinates Author : Eser Kandogan Reporter : Tze Ho-Lin 2007/5/9 SIGKDD, 2001.
3/13/2016 Data Mining 1 Lecture 2-1 Data Exploration: Understanding Data Phayung Meesad, Ph.D. King Mongkut’s University of Technology North Bangkok (KMUTNB)
Dr. Chen, Data Mining  A/W & Dr. Chen, Data Mining Chapter 3 Basic Data Mining Techniques Jason C. H. Chen, Ph.D. Professor of MIS School of Business.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Mulidimensional Detective “Multidimensional” : multivariate, many parameters “Detective” : focus is on the “discovery process”, finding patterns and trends.
CPH Dr. Charnigo Chap. 11 Notes Figure 11.2 provides a diagram which shows, at a glance, what a neural network does. Inputs X 1, X 2,.., X P are.
Dense-Region Based Compact Data Cube
01-Business intelligence
PowerPoint Lectures for Principles of Economics, 9e
Handouts Software Testing and Quality Assurance Theory and Practice Chapter 6 Domain Testing
Types of Testing Visit to more Learning Resources.
Jianping Fan Dept of CS UNC-Charlotte
On Spatial Joins in MapReduce
CSc4730/6730 Scientific Visualization
Splash Screen.
Inequalities Some problems in algebra lead to inequalities instead of equations. An inequality looks just like an equation, except that in the place of.
Metamorphic Exploration of an Unsupervised Clustering Program
IntroductionLecture 1: Basic Ideas & Terminology
Dr. Arslan Ornek MATHEMATICAL MODELS
Multivariable optimization with no constraints
Multidisciplinary Optimization
Presentation transcript:

1 Multidimensional Detective Alfred Inselberg, Multidimensional Graphs Ltd Tel Aviv University, Israel Presented by Yimeng Dou

2 Parallel Coordinates We can use parallel coordinates to model relations among multiple variables, and turn our problem into a 2-D pattern recognition problem. It’s very useful for Visual Data Mining. Two examples: VLSI chip and model of a country’s economy. The model can be used to do trade-off analyses, discover sensitivities, do approximate optimizations, monitor and Decision Support.

3 Goals of The Program Without any loss of information. Low representational complexity O(N) (N is the number of dimensions). Works for any N. Treat every variable uniformly. Can use transformations to recognize objects (rotation, translation, scaling, etc.). Easily/Intuitively convey information on the properties of the N-Dimensional object. Should be based on rigorous mathematical and algorithmic results.

4 In order to discover patterns from a large data set… Must use parallel coordinates effectively, with proper geometrical understanding and queries (hence the notion of “Multidimensional Detective”). Instead of mimicking the experience derived from standard display, a good model should exploit the special strengths of the methodology, avoids its weakness. This task is similar to accurately cutting complicated portions of an N-dimensional watermelon. The cutting tools should be well chosen and intuitive.

5 The VLSI Chip Problem Understand Figure 1—the full real data set. 473 batches, 16 processes (X1—X16). X1—Yield (The percentage of useful chips produced in the batch). X2—Quality (Speed performance) X3 through X12– 10 different types of defects. 0 defect appears on top. X13 through X16—physical parameters. The author didn’t specify how to find high yield or high quality. I think high values appear on top, with hints from some of his later description.

6 Objective Raise the yield (X1), and maintain high quality (X2). It’s a multiobjective optimization problem. It’s believed that the presence of defects hindered high yields and qualities. So the goal is—to achieve zero defects. (But is that really the case? ….let’s see)

7 Observations From Figure 2 It isolates the batches having the highest X1 and X2. Also, notice the two clusters of X15. It doesn’t include some batches having high X3 value (nearly 0 defects). So it casts doubt on the goal of “achieve zero defects”. Is it the right aim? To answer this question, we construct Figure 3, which includes batches having 0 defects in at least 9 categories (they are really close to the aim of zero defects). Do they have high yields and quality?

8 Figure 3—Our assumption is challenged. The nine batches have poor yields and low quality. Here’s another visual cue—X6. The process is much more sensitive to variations in X6 than the other defects. Treat X6 differently—select those batches with 0 X6 defects—the very best batch is included. (As shown in Figure 4).

9 Figure 5 and Figure 6—Test The Assumption Figure 5 shows those batches which does not have zeros for X3 and X6. Figure 6 shows the cluster of batches with top yields (notice there’s a gap in X1 between them and remaining batches, as seen in Figure 1). The finding—small amounts of X3 and X6 type defects are essential for high yields and quality. Besides, back to Fig.2, we can see X15’s relationship with X1/X2.

10 Our Conclusion For VLSI Chip Problem Small ranges of X3, X6 close to (but not equal to) zero, together with the lower range of X15 provide necessary conditions for high yields and quality. Fig.9 shows the result of constraining only X1 and the resulting gap in X15. Fig.10 shows only constraining X2 does not yield a gap in X15.

11 Other Insights and The Lesson We Learned From VLSI Example Fig.11 shows that except for two batches, the others all have very high X2. So we isolate these two batches in Fig.12—and find that the high yields but lower quality may be due to ranges of X6, X13, X14, X15. So it suggests that we can further partition this multivariate problem into sub-problems pertaining to individual objectives.

12 The Economic Model Example This example illustrates how to use interior point algorithm with the model, to do trade-off analyses, understand the impact of constraints, and in some cases do optimizations. Interior point algorithm—We can use it to find a point that is interior to a region, and satisfies all the constraints simultaneously, so in this case, it represents a feasible economic policy for a country. It is done interactively by sequentially choosing values of the variables. (Fig 13)

13 Result of Choosing The First Variable Once a value of the first variable is chosen(Agriculture output), the dimensionality of the region is reduced by one. We can see the relationship between Agriculture and Fishing (Low ranges corresponds to each other). So it’s possible to find a policy that favors Agriculture but not favoring Fishing and vice versa. Mining and Fishing (see from the lower lines of Fishing in Fig.13). We find the competition between them.

14 Neighborhood In Fig.15, a 20-dimensional model. The intermediate curves provide useful insights. The steep strips in X13, X14 and X15. These 3 are critical variables, where the point is bumping the boundary.

15 Boundary Point and Exterior Point Boundary point—If the polygonal line is tangent to anyone of the intermediate curves then it represents a boundary point. Exterior point—If it crosses any intermediate curves. Exterior point enables us to see the first variable for which the construction failed and what is needed to make corrections. By changing variables interactively, we can discover sensitive regions and other patterns.

16 Before We Come To Conclusion Is this model merely a model, or is it used (with the “intuitive” functionalities and high interactivity) in any software products? Is this model accurate enough? Is it sufficient to come to any conclusion about a problem using this technique when data set is very large? How to become a skillful detective? Can any software substitute people?

17 Conclusion Each multivariate dataset and problem has its own “personality”, so it requires substantial variations in the discovery scenarios and calls for considerable ingenuity ( a characteristic of a detective). An effort of automating the exploration process is under way. It will have a number of new features, like intelligent agents, which will learn from gathered experiences.