# What I am after from gR2002 Peter Green, University of Bristol, UK.

## Presentation on theme: "What I am after from gR2002 Peter Green, University of Bristol, UK."— Presentation transcript:

what I am after from gR2002 Peter Green, University of Bristol, UK

Why graphical models in R? Statistical modelling and analysis do not respect boundaries of model classes Software should encourage and support good practice - and graphical models are good practice! Data analysis - model-based R for reference implementation of new methodology Open software

Questions Scope? –Digram, MIM, CoCo, TETRAD, Hugin, BUGS? –Determined by classes of model, or classes of algorithm? Market? –Statistics researcher, statistics MSc, arbitrary Excel user? Delivery? –R package(s), with C code?

Markov chains Graphical models Contingency tables Spatial statistics Sufficiency Regression Covariance selection Statistical physics Genetics AI

Contents Hierarchical models Variable-length parameters Models with undirected edges Hidden Markov models Inference on structure Discrete graphical models/PES Grappa

BayesianHierarchical models properly integrating out all sources of variation

Repeated measures on children's weights Children i=1,2,…,k have their weights measured on n i occasions, t ij,j=1,2,…n i obtaining weights y ij. Suppose that, for each child, we have a linear growth equation, with independent normal errors

Repeated measures on children's weights, continued Suppose that vary across the population according to A Bayesian completes the model by specifying priors on

Graph for childrens weights

Measurement error Explanatory variables X subject to error - we only observe U on most cases

Contents Hierarchical models Variable-length parameters Models with undirected edges Hidden Markov models Inference on structure Discrete graphical models/PES Grappa

Mixture modelling DAG for a mixture model k w y

Mixture modelling DAG for a mixture model k w z y length=k value set ={1,2,…,k}

Measurement error using mixture model for population

Contents Hierarchical models Variable-length parameters Models with undirected edges Hidden Markov models Inference on structure Discrete graphical models/PES Grappa

Modelling with undirected graphs Directed acyclic graphs are a natural representation of the way we usually specify a statistical model - directionally: disease symptom past future parameters data ….. However, sometimes (e.g. spatial models) there is no natural direction

Scottish lip cancer data The rates of lip cancer in 56 counties in Scotland have been analysed by Clayton and Kaldor (1987) and Breslow and Clayton (1993) (the analysis here is based on the example in the WinBugs manual)

Scottish lip cancer data (2) The data include a covariate measuring the percentage of the population engaged in agriculture, fishing, or forestry, and the "position'' of each county expressed as a list of adjacent counties. the observed and expected cases (expected numbers based on the population and its age and sex distribution in the county),

Scottish lip cancer data (3) CountyObsExpxSMR Adjacent casescases(% in counties agric.) 191.416652.2 5,9,11,19 2398.716450.3 7,10.................. 5601.8100.0 18,24,30,33,45,55

Model for lip cancer data (1) Graph observed counts random spatial effects covariate regression coefficient expected counts

Model for lip cancer data Data: Link function: Random spatial effects: Priors: (2) Distributions

Bugs code for lip cancer data model { b[1:regions] ~ car.normal(adj[], weights[], num[], tau) b.mean <- mean(b[]) for (i in 1 : regions) { O[i] ~ dpois(mu[i]) log(mu[i]) <- log(E[i]) + alpha0 + alpha1 * x[i] / 10 + b[i] SMRhat[i] <- 100 * mu[i] / E[i] } alpha1 ~ dnorm(0.0, 1.0E-5) alpha0 ~ dflat() tau ~ dgamma(r, d) sigma <- 1 / sqrt(tau) } Note: declarative, rather than procedural language

Bugs code for lip cancer data model { b[1:regions] ~ car.normal(adj[], weights[], num[], tau) b.mean <- mean(b[]) for (i in 1 : regions) { O[i] ~ dpois(mu[i]) log(mu[i]) <- log(E[i]) + alpha0 + alpha1 * x[i] / 10 + b[i] SMRhat[i] <- 100 * mu[i] / E[i] } alpha1 ~ dnorm(0.0, 1.0E-5) alpha0 ~ dflat() tau ~ dgamma(r, d) sigma <- 1 / sqrt(tau) }

Bugs code for lip cancer data model { b[1:regions] ~ car.normal(adj[], weights[], num[], tau) b.mean <- mean(b[]) for (i in 1 : regions) { O[i] ~ dpois(mu[i]) log(mu[i]) <- log(E[i]) + alpha0 + alpha1 * x[i] / 10 + b[i] SMRhat[i] <- 100 * mu[i] / E[i] } alpha1 ~ dnorm(0.0, 1.0E-5) alpha0 ~ dflat() tau ~ dgamma(r, d) sigma <- 1 / sqrt(tau) }

Bugs code for lip cancer data model { b[1:regions] ~ car.normal(adj[], weights[], num[], tau) b.mean <- mean(b[]) for (i in 1 : regions) { O[i] ~ dpois(mu[i]) log(mu[i]) <- log(E[i]) + alpha0 + alpha1 * x[i] / 10 + b[i] SMRhat[i] <- 100 * mu[i] / E[i] } alpha1 ~ dnorm(0.0, 1.0E-5) alpha0 ~ dflat() tau ~ dgamma(r, d) sigma <- 1 / sqrt(tau) }

Bugs code for lip cancer data model { b[1:regions] ~ car.normal(adj[], weights[], num[], tau) b.mean <- mean(b[]) for (i in 1 : regions) { O[i] ~ dpois(mu[i]) log(mu[i]) <- log(E[i]) + alpha0 + alpha1 * x[i] / 10 + b[i] SMRhat[i] <- 100 * mu[i] / E[i] } alpha1 ~ dnorm(0.0, 1.0E-5) alpha0 ~ dflat() tau ~ dgamma(r, d) sigma <- 1 / sqrt(tau) }

WinBugs for lip cancer data Dynamic traces for some parameters:

WinBugs for lip cancer data Posterior densities for some parameters:

Contents Hierarchical models Variable-length parameters Models with undirected edges Hidden Markov models Inference on structure Discrete graphical models/PES Grappa

Hidden Markov models z0z0 z1z1 z2z2 z3z3 z4z4 y1y1 y2y2 y3y3 y4y4 e.g. Hidden Markov chain (DLM, state space model) observed hidden

relative risk parameters Hidden Markov models Richardson & Green (2000) used a hidden Markov random field model for disease mapping observed incidence expected incidence hidden MRF

DAG for Potts-based Hidden Markov random field spatial fields length=k

Distributions for Potts-based Hidden Markov random field

Larynx cancer in females in France SMRs

Ion channel signal restoration Hodgson, JRSS(B), 1999

DAG for alternating renewal process model for ion channel data Binary signal Data Sojourn time parameters

Contents Hierarchical models Variable-length parameters Models with undirected edges Hidden Markov models Inference on structure Discrete graphical models/PES Grappa

Ion channel model choice Hodgson and Green, Proc Roy Soc Lond A, 1999

Example: hidden continuous time models O2O2 O1O1 C1C1 C2C2 O1O1 O2O2 C1C1 C2C2 C3C3

DAG for hidden CTMC model for ion channel data Binary signal Data Model indicator Transition rates

Ion channel model DAG levels & variances model indicator transition rates hidden state data binary signal

levels & variances model indicator transition rates hidden state data binary signal O1O1 O2O2 C1C1 C2C2 C3C3 * * * * * * * * * * *

Posterior model probabilities O1O1 C1C1 O2O2 O1O1 C1C1 O2O2 O1O1 C1C1 C2C2 O1O1 C1C1 C2C2.41.12.36.10

Simultaneous inference on parameters and structure of CI graph : Bayesian approach: Place prior on all graphs, and conjugate prior on parameters (hyper-Markov laws, Dawid & Lauritzen), then use MCMC to update both graphs and parameters to simulate posterior distribution

Graph moves Giudici & Green (Biometrika, 1999) develop a Bayesian methodology for model selection in Gaussian models, assuming decomposability (= graph triangulated = no chordless -cycles) 76 5 23 4 1

Graph moves We can traverse graph space by adding and deleting single edges Some are OK, but others make graph non-decomposable 76 5 23 4 1

Graph moves Frydenberg & Lauritzen (1989) showed that all decomposable graphs are connected by single-edge moves Can we test for maintaining decomposability before committing to making the change? 76 5 23 4 1

Deleting edges? Deleting an edge maintains decomposability if and only if it is contained in exactly one clique of the current graph (Frydenberg & Lauritzen) 76 5 23 4 1

Adding edges? (Giudici & Green) Adding an edge (a,b) maintains decomposability if and only if either: 76 5 23 4 1 there exist sets R and T such that a R and b T are cliques and R T is a separator on the path in the junction tree between them a and b are in different connected components, or

Once the test is complete, actually committing to adding or deleting the edge is little work 76 5 23 4 1 12 2672363456 2636 2

76 5 23 4 1 127 2672363456 2636 27 12 2 It makes only a (relatively) local change to the junction tree Once the test is complete, actually committing to adding or deleting the edge is little work

Contents Hierarchical models Variable-length parameters Models with undirected edges Hidden Markov models Inference on structure Discrete graphical models/PES Grappa

DNA forensics example (thanks to Julia Mortera) A blood stain is found at a crime scene A body is found somewhere else! There is a suspect DNA profiles on all three - crime scene sample is a mixed trace: is it a mix of the victim and the suspect?

DNA forensics in Hugin

GRAPPA

Grappa code for the mixed-trace forensic problem vs('alleles',c('8','10','11','x')) gene.freq<<-c(.184884,.134884,.233721,.446511) founder('vmg'); founder('vpg') genotype('vgt','vmg','vpg') founder('smg'); founder('spg') genotype('sgt','smg','spg') query('T2eqv'); query('T1eqs') by('target','T2eqv','T1eqs') vs('target',c('SV','SU','UV','UU'))

select('T2mg','vmg','T2eqv') select('T2pg','vpg','T2eqv') select('T1mg','smg','T1eqs') select('T1pg','spg','T1eqs') genotype('T2gt','T2mg','T2pg') genotype('T1gt','T1mg','T1pg') mix('mix','T2gt','T1gt')

compile() initcliqs() trav() prop.evid('vgt','8-10') prop.evid('sgt','8-11') prop.evid('mix','8-10-11') pnmarg('target') ==>> target=SV target=SU target=UV target=UU 0.7278388 0.09543417 0.1485508 0.02817623

HSSS Highly Structured Stochastic Systems (HSSS) is the name given to a modern strategy for building statistical models for challenging real-world problems, for computing with them, and for interpreting the resulting inferences. Complexity is handled by working up from simple local assumptions in a coherent way, and that is the key to modelling, computation, inference and interpretation.

HSSS, contd HSSS emphasises common ideas and structures, such as graphical, hierarchical and spatial models, and techniques, such as Markov chain Monte Carlo methods and local exact computation.

HSSS: n ew challenges for research include developing diagnostic and analytic tools for model criticism; understanding sensitivity of models to local specifications; designing new MCMC algorithms, identifying limits of causal interpretation in networks representing observational studies; introducing nonparametric elements into graphical models; extending the theory and methodology to systems that develop over time.

Highly Structured Stochastic Systems book Graphical models and causality –T Richardson/P Spirtes, S Lauritzen, P Dawid, R Dahlhaus/M Eichler Spatial statistics –S Richardson, A Penttinen, H Rue/M Hurn/O Husby MCMC –G Roberts, P Green, C Berzuini/W Gilks

Highly Structured Stochastic Systems book (ctd) Biological applications –N Becker, S Heath, R Griffiths Beyond parametrics –N Hjort, A OHagan... with 30 discussants editors: N Hjort, S Richardson & P Green OUP (2003), to appear

Download ppt "What I am after from gR2002 Peter Green, University of Bristol, UK."

Similar presentations