Targeted Projection Pursuit Click here for an introduction.

Slides:



Advertisements
Similar presentations
Year 5 Term 3 Unit 8 Day 1.
Advertisements

Very simple to create with each dot representing a data value. Best for non continuous data but can be made for and quantitative data 2004 US Womens Soccer.
Learning Targets….I can
Publications Reviewed Searched Medline Hand screening of abstracts & papers Original study on human cancer patients Published in English before December.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
A New Algorithm of Fuzzy Clustering for Data with Uncertainties: Fuzzy c-Means for Data with Tolerance Defined as Hyper-rectangles ENDO Yasunori MIYAMOTO.
S2 INVENTOR GAME BOY part one MAIN BODY. S2 INVENTOR Here is what the finished result might look like once put together...
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , Chapter 8.
Shapes: Introductory basics you can't live without An introduction to shapes What is a shape? In Visio, the definition is much broader than you might think.
Table of Contents Rationalizing Denominators With Two Terms In a previous modules we rationalized a single term denominator. We now turn our attention.
Rationalizing Denominators With Two Terms In a previous modules we rationalized a single term denominator. We now turn our attention to rationalizing two.
Topology YAN JIE (Ryan).
TEKS 8.6 (A,B) & 8.7 (A,D) This slide is meant to be a title page for the whole presentation and not an actual slide. 8.6 (A) Generate similar shapes using.
HCI 530 : Seminar (HCI) Damian Schofield. HCI 530: Seminar (HCI) Transforms –Two Dimensional –Three Dimensional The Graphics Pipeline.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Multidimensional Analysis If you are comparing more than two conditions (for example 10 types of cancer) or if you are looking at a time series (cell cycle.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
SHAPES AND PATTERNS GRADE 3.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Mechanics of Materials(ME-294)
Module 1: Introduction to CNC Turning
GRADE 4/5 Lesson Study #2 N. Bekanovic, M. Ferraci, S. Jordan, N. Moore, A. Prosdocimo Introduction to the Protractor.
Composite Shapes Math 10-3 Ch.3 Measurement.  Consider a rectangle with the dimensions 2 cm by 3 cm.  -What is the perimeter? = 10 cm.
Data Mining – Algorithms: OneR Chapter 4, Section 4.1.
What is area? The amount of space that a figure encloses
Introduction and Vectors
EMSE 3123 Math and Science in Education
Unit 4 Lesson 1 CLASSIFYING TRIANGLES.  I will be able to classify triangles according to the angle measures and side lengths.  I WILL be able to define.
WEKA – Knowledge Flow & Simple CLI
WEKA - Explorer (sumber: WEKA Explorer user Guide for Version 3-5-5)
Inquiry Unit.
Machine Learning CSE 681 CH2 - Supervised Learning.
Diagnosis of multiple cancer types by shrunken centroids of gene expression Course: Topics in Bioinformatics Presenter: Ting Yang Teacher: Professor.
Learning from Observations Chapter 18 Through
Classification Course web page: vision.cis.udel.edu/~cv May 12, 2003  Lecture 33.
Symmetry - how many lines?. How many lines of symmetry? How many lines of symmetry do these shapes have? Try cutting out the shape and folding it to see.
Introduction to Flash Animation CS 318. Topics Introduction to Flash and animation The Flash development environment Creating Flash animations  Layers.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
PowerPoint Basics Tutorial 3: Graphics In this tutorial we’ll be looking at graphics, and the various types of illustrations that can be included in a.
CURE: EFFICIENT CLUSTERING ALGORITHM FOR LARGE DATASETS VULAVALA VAMSHI PRIYA.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
Cross-Product Packet Classification in GNIFS based on Non-overlapping Areas and Equivalence Class Author: Mohua Zhang, Ge Li Publisher: AISS 2012 Presenter:
Multiview Drawings. Multiview Drawing A multiview drawing is one that shows two or more two-dimensional views of a three- dimensional object. Multiview.
Mathematics in Kindergarten Geometry OBJECTIVE FOR TODAY In Math a well-balanced mathematics curriculum for Kindergarten to Grade 2 need a primary.
1 Trigonometry Basic Calculations of Angles and Sides of Right Triangles.
Info Read SEGY Wavelet estimation New Project Correlate near offset far offset Display Well Tie Elog Strata Geoview Hampson-Russell References Create New.
Machine Learning in Practice Lecture 2 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Extending linear models by transformation (section 3.4 in text) (lectures 3&4 on amlbook.com)
Geometry standards MATH 124. Kindergarten Identify and describe shapes. CCSS.Math.Content.K.G.A.1 Describe objects in the environment using names of shapes,
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Machine Learning in Practice Lecture 9 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
1.2 Displaying Quantitative Data with Graphs.  Each data value is shown as a dot above its location on the number line 1.Draw a horizontal axis (a number.
Data Science Credibility: Evaluating What’s Been Learned
Hierarchical Clustering: Time and Space requirements
Exploring Microarray data
Data Mining K-means Algorithm
Drawing Two-Dimensional Shapes with Digital Technology
OOP Paradigms There are four main aspects of Object-Orientated Programming Inheritance Polymorphism Abstraction Encapsulation We’ve seen Encapsulation.
Anaesthetic preconditioning but not postconditioning prevents early activation of the deleterious cardiac remodelling programme: evidence of opposing.
CSCI N317 Computation for Scientific Applications Unit Weka
Feature space tansformation methods
Machine Learning in Practice Lecture 22
ADD TITLE IN CAPS ADD SUBTITLE IN CAPS Type presentation author here
Assignment 1: Classification by K Nearest Neighbors (KNN) technique
Supervised machine learning: creating a model
Non-Negative Matrix Factorization
18-19 SBD ACCESS Student Biographical Data
Neural Networks Weka Lab
Data Mining CSCI 307, Spring 2019 Lecture 8
Presentation transcript:

Targeted Projection Pursuit Click here for an introduction

Targeted Projection Pursuit (TPP) allows you to visualise high- dimensional data. It shows the full picture behind classification errors, unsupervised clusterings, and attribute selections. It works by letting you explore projections of your data onto two dimensions

For example, this shows a projection of a three-dimensional data set onto the two-dimensional screen

If we look at the data from a different angle then we can see different aspects, such as how the data can be separated

And the same principal applies to higher dimensional data

The problem is then how to ‘steer’ your way through higher dimensional space to find useful views. This is the problem that TPP solves...

Load up some data and it is initially shown using the first two principal components (X=PC1, Y=PC2).

In this case there are 123 points, each representing a sample taken from a cancer tumor. For each sample we have measured the expression level of 100 genes. Each sample is classified into one of four types – indicated by color.

This shows the axes And this shows the components (X and Y). The table also shows the overall length of each axis (Significance). Click on the column header to re-order the table.

Select points by clicking on the class button or by dragging a rectangle round them

The color of the axes then shows their relative values for the selected points (blue=low, red=high)

TPP lets you find other views of the data by dragging selected points

The axes move and the table updates as TPP finds a projection that matches your movements

In this case the ‘A’ points can be separated from the others – showing there is a consistent difference in the data

We can also separate the ‘D’ points

But this one didn’t move. This shows it couldn’t be separated from the Bs and Cs. We’ve spotted an outlier, or a possible misdiagnosis.

What about the B’s and C’s? Turns out they can’t be separated – showing us the labelled differences don’t correspond to differences in the data

Now we’ve got a clear view of the classes we can the color points by the values of individual attributes

In this case we can see that this gene is low for all of the C’s, but no there’s no reliable pattern for the other classes

And this gene is exceptionally high for just this one sample. Could be worth investigating.

We can also create and look at clusters. Here we create three clusters (shown by color) and see that they correspond to the groupings in the data we found.

Now try four clusters. The B-C group gets split up, but the split doesn’t correspond to the original classes. (Clusters shown by color; supervised classes shown by shape.)

It looks like the samples ‘naturally’ divide into three rather than four clusters.

Lets see how a classification algorithm would perform on this data. Here we’ve used a KNN classifier from the Weka toolkit, with 10-fold cross validation. The empty circles show the errors.

All the errors occur in the B-C group as expected – including that possible misdiagnosis we spotted earlier.

Now lets see which genes are the important ones by selecting attributes. Select all the shortest axes and set them to zero.

There’s still very good separation – we could eliminate some more

And soon we find just five genes that between them distinguish the types of cancer – and see how they act together

Click on ‘File’ to load a data file