Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modern Topics in Multivariate Methods for Data Analysis.

Similar presentations


Presentation on theme: "Modern Topics in Multivariate Methods for Data Analysis."— Presentation transcript:

1 Modern Topics in Multivariate Methods for Data Analysis

2 Semi-Supervised Learning Semi-Supervised Learning Transfer Learning Transfer Learning Active Learning Active Learning Summary Summary Modern Topics in Multivariate Methods for Data Analysis

3 Semi-Supervised Learning This is an extension to supervised learning. We have two sets of data: Motivation: labeled data is sometimes hard to obtain. Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007

4 An example from Mars Data Analysis Digital Elevation Map Geomorphic Map Martian landscape Manually drawn geomorphic map of this landscape Geomorphic map shows landforms chosen and defined by a domain expert.

5 Segmentation

6 Segmentation: Results. Displayed on an elevation background. 2631 segments homogeneous in slope, curvature and flood.

7 Classification: Labeling. A representative subset of objects are labeled as one of the following six classes: Plain Crater Floor Convex Crater Walls Concave Crater Walls Convex Ridges Concave Ridges Labeled segments.

8 Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007 How do we approach semi-supervised learning?

9 Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007 A Case with No Unlabeled Data

10 Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007 A Case with Unlabeled Data

11 Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007 A Case with Unlabeled Data

12 Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007 A Case with Unlabeled Data

13 Graph-Based Models Figure obtained from X. Zhu. Semi-Supervised Learning Tutorial. ICML 2007

14 How can we learn from unlabeled data at all? The answer lies in the set of assumptions about the unlabeled data distribution. If assumptions are right, an advantage can be obtained using unlabeled data But a decrease in performance is possible if assumptions are incorrect. Assumptions in Semi-Supervised Learning

15 Semi-Supervised Learning Semi-Supervised Learning Transfer Learning Transfer Learning Active Learning Active Learning Summary Summary Modern Topics in Multivariate Methods for Data Analysis

16 The goal is to transfer knowledge gathered from previous experience. Also called Inductive Transfer or Learning to Learn. Example: Invariant transformations across tasks. Transfer Learning

17 Motivation for transfer learning Once a predictive model is built, there are reasons to believe the model will cease to be valid at some point in time. The difference is that now source and target domains can be completely different. Motivation Transfer Learning

18 Traditional Approach to Classification DB1DB2DBn Learning System

19 Transfer Learning DB1DB2 DB new Learning System Knowledge Source domain Target domain

20 Transfer Learning Scenarios: 1.Labeling in a new domain is costly. DB1 (labeled) Classification of Patients G1 DB2 (unlabeled) Classification of Patients G2

21 Transfer Learning Scenarios: 2. Data is outdated. Model created with one survey but a new survey is now available. Survey 1 Learning System Survey 2 ?

22 Input nodes Internal nodes Output nodes LeftStraightRight Functional Transfer: Multitask Learning

23 Train in Parallel with Combined Architecture Figure obtained from Brazdil, et. Al. Metalearning: Applications to Data Mining, Chapter 7, Springer, 2009.

24 Knowledge of Parameters Assume prior distribution of parameters Source domain Learn parameters and adjust prior distribution Target domain Learn parameters using the source prior distribution.

25 P(y|x) = P(x|y) P(y) / P(x) Parameter Similarity Task A  Parameter A Task B  Parameter B ~ A Assume hyper-distribution with low variance. Assume Parameter Similarity

26 Knowledge of Parameters Find coefficients w s using SVMs Find coefficients w T using SVMs initializing the search with w s

27 Feature Transfer Feature Transfer: Target domain Source domain Shared representation across tasks Minimize Loss-Function( y, f(x)) The minimization is done over multiple tasks (multiple regions on Mars).

28 Feature Transfer Identify common Features to all tasks

29 Instance Transfer Learning Instance Transfer: Learning System Target domain Source domain Filter samples Larger target dataset New program called TrAdaboost

30 Semi-Supervised Learning Semi-Supervised Learning Transfer Learning Transfer Learning Active Learning Active Learning Summary Summary Modern Topics in Multivariate Methods for Data Analysis

31 Active learning is part of the field of supervised learning. We have labeled and unlabeled data. The novel idea is that we can choose which examples to label during learning. It is also called “Query Learning”. Labeled Data Unlabeled Data  Select examples Active Learning

32 Types of Active Learning: 1.Query Synthesis. The learner can request an example from anywhere in the instance space. It is only appropriate with small finite domains. Some examples may have no meaning. Active Learning

33 Types of Active Learning: 2. Stream-Based Selective Sampling Instances are drawn from the input space according to a distribution, and the learner can decide to discard it or not. For example, one can only choose examples from regions of uncertainty. Active Learning

34 Types of Active Learning: 3. Pool-Based Sampling Assume a small set of labeled examples and a large set of unlabeled examples. Here we evaluate and rank the whole set of unlabeled examples; we then choose one or more examples. Active Learning

35 Sampling Based on Uncertainty Figure taken from “Active Learning” by Burr Settles, Morgan & Claypool, 2012. 70% accuracy 90% accuracy

36 Uncertainty: 1.0 0.5 1.0 Sampling Based on Uncertainty

37 Semi-Supervised Learning Semi-Supervised Learning Transfer Learning Transfer Learning Active Learning Active Learning Summary Summary Modern Topics in Multivariate Methods for Data Analysis

38 Few labeled examples, labeling is expensive, many unlabeled examples  Semi-Supervised Similar classification tasks but there is indication that the distributions have changed  Transfer Learning Few training examples, labeling is expensive  Active Learning Summary


Download ppt "Modern Topics in Multivariate Methods for Data Analysis."

Similar presentations


Ads by Google