Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistical Data Mining

Similar presentations


Presentation on theme: "Statistical Data Mining"— Presentation transcript:

1 Statistical Data Mining
SHORT LECTURE

2 Virtual Patient Approach in SMITH
Patient + state paramters: individualisation (Machine Learning) generic model (mechanistic) adaption of generic model to available data (Machine Learning) Slide courtesy by Prof. Dr. Andreas Schuppert – RWTH Aachen University Hospital & Bayer AG Short Lecture – Markov Chains Monte Carlo in SMITH

3 Probability Theory – Contributions by Bernoulli
Key question: How likely is a future event? Initial probability theory has roots in gambling Research in understanding uncertainty ever since E.g. understanding odds as the ratio of favorable / unfavorable outcomes E.g. probability connected to dices, cards, coins, marbles, balls, etc. Simple Example Unknown number of X black balls and Y white balls within a jar Determine the proportions of black balls & white balls? Approach: Perform a series of random draws (aka trials) from the jar Expected value of white vs. black draws will converge toward the real ratio from the jar as the number of extractions increases [1] P.A. Gagniuc, Book Markov Chains, 2017 Probability is a measure of how likely a future ‘event’ is (e.g. probability to get white/black ball) After many random draws / trials (e.g. from the jar), the observations (e.g. white balls vs. black balls) will converge toward the real ratio of elements (e.g. ratio white balls vs. black balls) Short Lecture – Markov Chains Monte Carlo in SMITH

4 Probability Theory – Contributions by Markov
Probability Theory (Bernoulli) Outcomes of previous events do not change the outcome of future events Practice Example (selected additions by Markov) Above not always correct: events are not independent in many cases In short: added dependent events / dependent variables Dependent events / variables refer to those situations when the probability of an event is conditioned by the events that took place in the past (aka adding a ‘events over time dimension‘) [1] P.A. Gagniuc, Book Markov Chains, 2017 draw rule imposed: color of the current ball indicates color of the jar from which next draw will be made observations for one jar: draws are independent from each other (Bernoulli) from ‘Bernoulli‘ to ‘Markov‘ ‘a white jar‘ ‘a black jar‘ Short Lecture – Markov Chains Monte Carlo in SMITH

5 Simple Markov Model – Stochastic Model & Probability
Based on probability theory a Markov model is used to model randomly changing systems A two-state ‘Markov chain‘ is the most basic Markov model that illustrates the Markov process A ‘Markov matrix‘ is a stochastic matrix that is used to describe the transitions of a ‘Markov chain‘ modified from [1] P.A. Gagniuc, Book Markov Chains, 2017 Markov diagram showing an example of a ‘randomly changing system‘ W ‘a white state‘ B ‘a black state‘ Draw rule imposed: color of the current ball indicates color of the jar from which next draw will be made Markov matrix (simplified, based on colors) To W To B From W From B ‘a white jar‘ ‘a black jar‘ Short Lecture – Markov Chains Monte Carlo in SMITH

6 Monte Carlo Methods – Approach
Monte Carlo (MC) methods rely on repeated random sampling to obtain numerical results MC run simulations many times over to obtain the distribution of an unknown probabilistic entity (origin related to techniques in playing&recording results in gambling casinos) [2] Wikipedia on Monte Carlo method Method applicability Useful when it is difficult or impossible to apply a deterministic algorithm Define a domain of possible inputs Generate inputs randomly from a probability distribution over the domain Perform a deterministic computation on the inputs and aggregate the results (also known as MC ensemble method) (Example of approximating the value of pi – After placing many (30000) random sampling points, the estimate for pi is within 0.07% of the value – happens with an approximate probability of 20%) Pointwise real data inputs  model  Output (count the number of points inside the circle and the total number of points) (w/o monte carlo) (the ratio of the two counts is an estimate of the two areas ~ pi/4) Pointwise sampling from probability distribution Inputs  (estimate of the two areas gives ~pi/4, so multiply with 4 to estimate pi) model  Output (with monte carlo) Short Lecture – Markov Chains Monte Carlo in SMITH

7 Monte Carlo Methods – MPI Example
Include library for a parallel random number generator (PRNG) – Important part of MC! #include <math.h> #include <mpi.h> #include <gsl/sdl_rng.h> #include “gsl-spring.h“ int main( int argc; char *argv[]) { int i, k, N; double u, ksum, Nsum, gsl_rng *r; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD, &N); MPI_Comm_rank(MPI_COMM_WORLD, &k); r = gsl_rng_alloc(gsl_rng_spring20); for (i=0; i<10000; i++) { u = gsl_rng_uniform(r); ksum += exp(-u*u); } MPI_Reduce(&ksum, &Nsum, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); printf(“MC estimate is %f\n“, (Nsum/10000/N)); MPI_Finalize(); return 0; Initializes random number generator (with a specific type) Use variates on each processor to create ‘local sum’ Gsl_rng_uniform(r) function returns a double precision floating point number uniformly distributed in the range [0,1) (idea of probability distribution) Use MPI_Reduce to create ‘global sum’ and printout estimate (MPI code prints out estimate of integral) ‘simplified demo code’ modified from [3] MSMC Short Lecture – Markov Chains Monte Carlo in SMITH

8 Markov Chains - Monte Carlo (MCMC) – ASIC Use Case
Unsupervised Patient Stratification Dynamic clustering Critical state detection Predictive modelling Machine – ASIC system VP Patient Data (RDR) Prognosis for Individual patient DEA Patient subgroups & classifiers Machine Learning, Patient association, Subgroup specific prediction Slide courtesy by Prof. Dr. Andreas Schuppert – RWTH Aachen University Hospital & Bayer AG Short Lecture – Markov Chains Monte Carlo in SMITH

9 Lecture Bibliography Short Lecture – Markov Chains Monte Carlo in SMITH

10 Lecture Bibliography [1] P.A. Gagniuc, ‘Markov Chains: From Theory to Implementation and Experimentation’, Book John Wiley & Sons, ISBN , 2017 [2] Wikipedia on Monte Carlo, Online: [3] Getting started with MCMC, Online: Short Lecture – Markov Chains Monte Carlo in SMITH

11 Short Lecture – Markov Chains Monte Carlo in SMITH


Download ppt "Statistical Data Mining"

Similar presentations


Ads by Google