New Trends In Machine Learning and Data Science Ricardo Vilalta Dept

New Trends In Machine Learning and Data Science Ricardo Vilalta Dept
New Trends In Machine Learning and Data Science Ricardo Vilalta Dept. of Computer Science University of Houston September, 2015

New Trends in Machine Learning and Data Science
Introduction to Machine Learning Machine Learning in Geology Transfer Learning Active Learning Deep Learning Summary

Machine Learning

Classification or Supervised Learning
Supervised Learning: Training set x = {x1, x2, …, xN} Class or target vector y = {y1, y2, …, yk} Find a function f(x) that takes a vector x and outputs a class y. {(x,y)} What is machine learning? {(x,y)} f(x)

Clustering or Unsupervised Learning
Unsupervised Learning: Training set x = {x1, x2, …, xN} No class or target vector available Find natural groups or clusters in the data What is machine learning? {x}

An application of supervised learning
Automatic car drive Train computer-controlled vehicle to steer correctly when driving on a variety of road types. computer (learning algorithm) class 1 steer to the left class 2 steer to the right class 3 continue straight

DARPA Challenge Competition for driverless vehicles
DARPA – Defense Advanced Research Projects Agency $2 million dollars – First prize in Oct. 2005 What is machine learning?

Other applications of supervised learning
Bio-Technology Protein Folding Prediction Micro-array gene expression Computer Systems Performance Prediction Banking Applications Credit Applications Fraud Detection Character Recognition (US Postal Service) Web Applications Document Classification Learning User Preferences

Application on the Surface of Mars: Automated Creation of Geomorphic Maps
Martian landscape Geomorphic map shows landforms chosen and defined by a domain expert. Digital Elevation Map Geomorphic Map Manually drawn geomorphic map of this landscape

Attribute Representation
Represent the surface of Mars as a quantized rectangular space composed of pixels. P1,1 P1,2 ...... P1,n …… ….. Pn,1 F1 …. Fn Pij represent pixels. Fi represents features.

Initial Work: Unsupervised Learning
Each pixel has 6 features Clustering of pixels using EM. The number of clusters is calculated using cross-validation. Landform categories are identified with clusters. Stepinski & Vilalta, “Digital Topography Models for Martian Surfaces”, IEEE Geoscience and Remote Sensing Letters, 2(3), p260., 2005

Initial Work: Results 12 resultant clusters
Each cluster given a posteriori meaning by domain expert. After meaning is assigned 12 clusters are grouped into 4 super-clusters based on meaning.

Our Approach: Pixel based topographic data
(DEMs) Object based topographic data Segmentation Geomorphic Map(s) Supervised Learning

Segmentation

Segmentation: Results
2631 segments homogeneous in slope, curvature and flood. Displayed on an elevation background.

Segmentation: Results

Landforms of Interest (Classes):
Crater Floor. Crater Wall. Convex Concave Flat Plain. Ridge.

Classification: Labeling
A representative subset of objects are labeled as one of the following six classes: Plain Crater Floor Convex Crater Walls Concave Crater Walls Convex Ridges Concave Ridges 517 labeled segments.

Classification: Results
Plain Crater Floor Convex Crater Walls Concave Crater Walls Convex Ridges Concave Ridges

Perspective View

Test Site: EvrovallisW

Classification: Results
Plain Crater Floor Convex Crater Walls Concave Crater Walls Convex Ridges Concave Ridges

Application on Seismic Data
Construction and Evaluation of Relevant Attributes Attributes are selected based on their capacity to separate one class from another (e.g., salt deposit from background). Methodology: Sample from inside salt deposit Sample from outside salt deposit Training dataset Statistical and Information Theoretic Metrics

Unsupervised Learning of Geological Bodies
Methodology New processed training dataset (using data filters) Cube of seismic data Unsupervised Learning Algorithm Clustering

Supervised Learning of Geological Bodies
Methodology New processed training dataset (using data filters) Cube of seismic data Learning Algorithm Expert Labels Support Vector Machines Adaboost Random

Challenges: The sheer size of the 3D data cube precludes training predictive models with more than just 1% of the available training. 0.5% of the data corresponds to 2 million voxels. Our experiments were performed on a computer with 64 GB of memory and 12 cores. It took days to complete the entire data processing. node1 node3 node5 node4 node2 High Bayes Error in classification.

Challenges: Single attributes bear incomplete information about the class.

Transfer Learning The goal is to transfer knowledge gathered from previous experience. Also called Inductive Transfer or Learning to Learn. Example: Invariant transformations across tasks.

Motivation Transfer Learning
Motivation for transfer learning Once a predictive model is built, there are reasons to believe the model will cease to be valid at some point in time. The difference is that now source and target domains can be completely different.

Traditional Approach to Classification
DB1 DB2 DBn Learning System Learning System Learning System

Transfer Learning DB1 DB2 Source domain DB new Target domain
Learning System Learning System Learning System Knowledge

Knowledge of Parameters
Assume prior distribution of parameters Source domain Learn parameters and adjust prior distribution Target domain Learn parameters using the source prior distribution.

Knowledge of Parameters
Find coefficients ws using SVMs Find coefficients wT using SVMs initializing the search with ws

Feature Transfer Feature Transfer: Source Target domain domain
Shared representation across tasks Minimize Loss-Function( y, f(x)) The minimization is done over multiple tasks (multiple regions on Mars).

Feature Transfer Identify common Features to all tasks

Classification: Labeling
A representative subset of objects are labeled as one of the following six classes: Plain Crater Floor Convex Crater Walls Concave Crater Walls Convex Ridges Concave Ridges 517 labeled segments.

Active Learning Learning Algo. Pool-Based Sampling
Assume a small set of labeled examples and a large set of unlabeled examples. Here we evaluate and rank the whole set of unlabeled examples; we then choose one or more examples. Learning Algo.

Sampling Based on Uncertainty

Sampling Based on Uncertainty
70% accuracy % accuracy Figure taken from “Active Learning” by Burr Settles, Morgan & Claypool, 2012.

Commercial Planes, Military Planes
Deep Learning The idea is to disentangle factors of variation and to attain high level representations. Commercial Planes, Military Planes Engine, Main Fuselage Small Object Parts Edges and Contours Pixel Information

Deep Learning We want to capture compact, high-level representations in an efficient and iterative manner. Learning takes place at several levels of representations. Think about a hierarchy of concepts of increasing complexity. Low levels concepts are the foundation for high level concepts.

Deep Learning Deep Learning is important to avoid the credit-assignment problem in deep neural networks. Who to blame? What is machine learning?

Deep Learning Deep Learning has gained in popularity during the past years. Military Automotive Surveillance Financial Medical What is machine learning?

Deep Learning There are three basic types on Deep Networks:
Deep Networks for unsupervised or generative learning. Capture high order correlations of the data (no class labels) Deep Networks for Supervised Learning Model the posterior distribution of the target variable for classification purposes (Discriminative Deep Networks). Hybrid Deep Networks Combine the methods above.

Deep Learning Deep Networks for Unsupervised Learning
There are no class labels during the learning process. There are many types of generative or unsupervised deep networks. Energy-based deep networks are the most popular. Example: Deep Auto Encoder.

Deep Learning Auto Encoder

Deep Learning No. of output features = No input features Auto Encoder
Intermediate nodes encode the original data.

Deep Learning “Deep” Auto Encoder
Key idea: Pre-train each layer as an auto-encoder.

An Example in Deep Learning
Learn a “concept” (sedimentary rocks) from many images until a high-level representation is achieved.

An Example in Deep Learning
Learn a hierarchy of abstract concepts using deep learning. Global properties Deep Learning Local properties

Deep Learning There are three basic types on Deep Networks:
Deep Networks for unsupervised or generative learning. Capture high order correlations of the data (no class labels) Deep Networks for Supervised Learning Model the posterior distribution of the target variable for classification purposes (Discriminative Deep Networks). Hybrid Deep Networks Combine the methods above.

Deep Learning Convolutional Neural Networks Local Weight Update
Implies a sparse representation

Deep Learning The idea is still to find a minimum in the space of weights and the error function E: E(W) w1 w2

Deep Learning Output nodes Internal nodes Input nodes

Deep Learning on Seismic Data
Methodology New training dataset Deep Learning Cube of seismic data Expert Labels Learning Algorithm

Challenges: Single attributes bear incomplete information about the class.

Challenges: Deep learning can capture “global” features that detect entire geological bodies as the result of the non-linear combination of many local models.

Decompose seismic cube into small cubes and create a large no. of examples.

Each cube is an example that we can feed into a deep learning architecture.

Summary When we have similar classification tasks but there is indication that the distributions have changed  Transfer Learning When we have few training examples, labeling is expensive  Active Learning When we need more abstract features  Deep Learning

Conclusions Deep Learning can provide new high-level global features.
Entire global geological structures can be identified by combining Low level feature representations of seismic data.

THANK YOU

New Trends In Machine Learning and Data Science Ricardo Vilalta Dept

Similar presentations

Presentation on theme: "New Trends In Machine Learning and Data Science Ricardo Vilalta Dept"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

New Trends In Machine Learning and Data Science Ricardo Vilalta Dept

Similar presentations

Presentation on theme: "New Trends In Machine Learning and Data Science Ricardo Vilalta Dept"— Presentation transcript:

Similar presentations

About project

Feedback