Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Pattern Finding and Pattern Discovery in Time Series
State Estimation and Kalman Filtering CS B659 Spring 2013 Kris Hauser.
Unsupervised Learning
Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
An Overview of Machine Learning
Supervised Learning Recap
What is Statistical Modeling
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Bayesian Networks Chapter 2 (Duda et al.) – Section 2.11
Generative Models Rong Jin. Statistical Inference Training ExamplesLearning a Statistical Model  Prediction p(x;  ) Female: Gaussian distribution N(
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Chapter 5 Data mining : A Closer Look.
Learning In Bayesian Networks. Learning Problem Set of random variables X = {W, X, Y, Z, …} Training set D = { x 1, x 2, …, x N }  Each observation specifies.
Introduction to machine learning
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
Read R&N Ch Next lecture: Read R&N
1 Statistical Tools for Multivariate Six Sigma Dr. Neil W. Polhemus CTO & Director of Development StatPoint, Inc.
Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Machine Learning CUNY Graduate Center Lecture 1: Introduction.
Anomaly detection with Bayesian networks Website: John Sandiford.
EM and expected complete log-likelihood Mixture of Experts
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Comparison of Bayesian Neural Networks with TMVA classifiers Richa Sharma, Vipin Bhatnagar Panjab University, Chandigarh India-CMS March, 2009 Meeting,
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Hidden Markov Models in Keystroke Dynamics Md Liakat Ali, John V. Monaco, and Charles C. Tappert Seidenberg School of CSIS, Pace University, White Plains,
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
CS Statistical Machine learning Lecture 24
Slides for “Data Mining” by I. H. Witten and E. Frank.
Learning In Bayesian Networks. General Learning Problem Set of random variables X = {X 1, X 2, X 3, X 4, …} Training set D = { X (1), X (2), …, X (N)
Lecture 2: Statistical learning primer for biologists
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Data Mining and Decision Support
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Hidden Markov Models. A Hidden Markov Model consists of 1.A sequence of states {X t |t  T } = {X 1, X 2,..., X T }, and 2.A sequence of observations.
Naïve Bayes Classifier April 25 th, Classification Methods (1) Manual classification Used by Yahoo!, Looksmart, about.com, ODP Very accurate when.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Overview G. Jogesh Babu. R Programming environment Introduction to R programming language R is an integrated suite of software facilities for data manipulation,
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning with Spark MLlib
Machine Learning Models
Read R&N Ch Next lecture: Read R&N
Overview G. Jogesh Babu.
Data Mining Lecture 11.
Vincent Granville, Ph.D. Co-Founder, DSC
Baselining PMU Data to Find Patterns and Anomalies
Read R&N Ch Next lecture: Read R&N
Probabilistic Models with Latent Variables
Classification and Prediction
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Chapter14-cont..
Parametric Methods Berlin Chen, 2005 References:
Read R&N Ch Next lecture: Read R&N
Machine Learning – a Probabilistic Perspective
NAÏVE BAYES CLASSIFICATION
What is Artificial Intelligence?
Presentation transcript:

Bayesian networks Classification, segmentation, time series prediction and more. Website: John Sandiford

Contents Background Bayesian networks Classification / regression Clustering / segmentation Time series prediction Website:

Background Mathematics Algorithms Data Mining Machine Learning Artificial Intelligence Bayesian networks – Research (Imperial College) – Software BAE Systems – Future concepts – Ground based diagnostics – Technical computing GE (General Electric) – Diagnostics – Prognostics – Reasoning New York Stock Exchange – Business Intelligence Bayes Server – Bayesian network software – Technical director Website:

Bayesian networks Probabilistic Graphical Not a black box Handle conflicting evidence Unlike many rule based systems Multivariate Data driven and/or expert driven Missing data Website:

Tasks & Models Tasks Classification Regression Clustering / Mixture models Density estimation Time series prediction Anomaly detection Decision Support Multivariate models Learning with missing data Probabilistic reasoning Text analytics Models Multivariate Linear Regression Mixture models Time Series models – AR, Vector AR Hidden Markov Models Linear Kalman Filters Probabilistic PCA Factor Analysis Hybrid models – E.g. Mixtures of PPCA Website:

Bayesian networks High dimensional data – Humans find difficult to interpret Discrete and continuous variables Allow missing data – Learning – Prediction Temporal and non temporal variables in the same model Website:

What are Bayesian networks? Network with nodes and links Nodes represent one or more variables Directed links used when nodes directly influence each other – N.B. nodes may influence each other indirectly via other nodes Encode conditional independence assumptions Website:

Model parameters Each node requires a probability distribution conditioned on its parents Website: A=FalseA=True AB=FalseB=True False True0.40.6

What is inference? Asking a question, given what we know – E.g. P(A|B=True, E=False) We could multiply all node distributions together and get the joint distribution over all variables. However we can perform inference much more efficiently using the factored form Website:

Construction 1.Add nodes (variables) – Manually (expert opinion) – From data Data can be discretized if required 2.Add links – Manually (expert opinion) – From data Constraint based Search & score 3.Specify the parameters of the model – Manually (expert opinion) – From data EM learning (handles missing data) Website:

Demonstration - construction ABCD FalseTrue FalseTrue FalseTrue False … Website:

Notes on construction Support for discrete & continuous Missing data Time series data Website:

CLASSIFICATION / REGRESSION In this section we discuss classification and regression with Bayesian networks. Website:

What is classification / regression? Website: Predict unknown values (output variables), using a number of known values (input variables). Learning is supervised Classification – Predicting discrete variables. Regression – Predicting continuous variables. Examples – Predict the probability of a disease given symptoms – Predict Bull/Bear market from market indicators

Classification structure Website:

Classification outputs Multiple outputsMutually exclusive Website:

Naïve Bayes classifier Simple Fast Conditionally independent inputs Spam filters Website:

Regression structure Website:

Demonstration Identification network Website:

Training Missing data Training data includes ‘Gender’ Can mix expert opinion and data Website: BuildGenderHair LengthHeightWeight AverageMaleShort AverageFemaleMedium ?MaleMedium AverageMaleShort …

Prediction Website:

Model performance & comparison Additional variables? BIC Confusion matrix Lift Chart Over fitting Website:

Confusion matrix Website:

Lift chart Website:

CLUSTERING / SEGMENTATION In this section we discuss clustering / segmentation with Bayesian networks Website:

What is clustering / segmentation? Unsupervised learning approach No outputs in the data, only inputs Finds natural groupings in the data Multivariate, handling high dimensional data E.g. Targeted marketing Website:

Usage Data exploration Mixture models are useful for identifying key characteristics of your data, such as the most common relationships between variables, and also unusual relationships. Segmentation Because clustering detects similar groups, we can identify a group that has certain qualities and then determine segments of our data that have a high probability of belonging to that group. Anomaly detection Unseen data can be compared against a model, to determine how unusual (anomalous) that data is. Often the log likelihood statistic is used as a measure, as it tells you how likely it is that the model could have generated that data point. While humans are very good at interpreting 2D and 3D data, we are not so good in higher dimensional space. For example a mixture model could have tens or even hundreds of dimensions. Prediction Although Mixture models are an unsupervised learning technique, we can use them for prediction if during learning, we include variables we wish to predict (output variables). Website:

Demonstration – mixture model Website:

Mixture model – anomaly detection No data mapped to Cluster variable Missing data allowed Predict (Cluster) Log likelihood Conflict Website: XYZ …

Mixture model – batch prediction Website:

Website: Bi-model or tri-modal Univariate analysis looks normal

Multivariate prediction (log-likelihood) Website:

Multivariate prediction (conflict) Website:

TIME SERIES PREDICTION In this section we discuss time series prediction with Bayesian networks Website:

Time series models Known as Dynamic Bayesian networks Discrete & continuous Multivariate time series – (Partial) Auto correlations – (Partial) Cross correlations A node can be linked to itself Website:

Time series Temporal & non temporal variables Classification, Regression, Log likelihood Modelling time series data without a time series model Website:

CaseGenderAge 1Female35 2Male? 3Female25 … Website: CaseTimeTransitionObs1Obs2 10Cluster Cluster Cluster 1?8.6 21Cluster Cluster Cluster …

Types of time series model Auto regressive / vector auto regressive N-order markov models Hidden markov models Kalman filters Any of these well known models can be extended Website:

Vector auto regressive Website:

N-order markov models Website:

Hidden Markov model Website:

Kalman Filter Website:

Mixture of vector auto regressive Website:

Types of time series prediction Prediction - calculating queries a number of time steps into the future. Filtering - calculating queries at the current time. Smoothing - calculating queries a number of time steps into the past (calculating historic values) Most probable sequence - calculating the most likely sequence of past values (generalized version of the viterbi algorithm) Website:

Demonstration Parameter Learning Prediction – Data explorer – Batch queries Sampling – Charting Structural learning – Determine links & orders Website:

Questions Website: