Alan Girling University of Birmingham, UK

Slides:

Advertisements

Similar presentations

Autocorrelation and Heteroskedasticity

Advertisements

1 Regression as Moment Structure. 2 Regression Equation Y =  X + v Observable Variables Y z = X Moment matrix  YY  YX  =  YX  XX Moment structure.

Multiple Regression Analysis

Lecture 19: Parallel Algorithms

Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.

1 Transportation problem The transportation problem seeks the determination of a minimum cost transportation plan for a single commodity from a number.

Algebraic, transcendental (i.e., involving trigonometric and exponential functions), ordinary differential equations, or partial differential equations...

Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.

Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.

Lecture 6 (chapter 5) Revised on 2/22/2008. Parametric Models for Covariance Structure We consider the General Linear Model for correlated data, but assume.

Ch11 Curve Fitting Dr. Deshi Ye

More MR Fingerprinting

Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.

Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.

Maximum likelihood (ML) and likelihood ratio (LR) test

Principal Components An Introduction Exploratory factoring Meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.

Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.

1 Lecture 25: Parallel Algorithms II Topics: matrix, graph, and sort algorithms Tuesday presentations:  Each group: 10 minutes  Describe the problem,

Linear and generalised linear models

Linear and generalised linear models

Maximum likelihood (ML)

Adaptive Signal Processing

Objectives of Multiple Regression

Sampling : Error and bias. Sampling definitions  Sampling universe  Sampling frame  Sampling unit  Basic sampling unit or elementary unit  Sampling.

Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.

Non Negative Matrix Factorization

Simplex method (algebraic interpretation)

Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.

SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.

ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.

Stat 112: Notes 2 Today’s class: Section 3.3. –Full description of simple linear regression model. –Checking the assumptions of the simple linear regression.

NCAF Manchester July 2000 Graham Hesketh Information Engineering Group Rolls-Royce Strategic Research Centre.

Large-Scale Matrix Factorization with Missing Data under Additional Constraints Kaushik Mitra University of Maryland, College Park, MD Sameer Sheoreyy.

Optimization in Engineering Design 1 Introduction to Non-Linear Optimization.

Comparison Value vs Policy iteration

Analysis of financial data Anders Lundquist Spring 2010.

Dimension reduction (2) EDR space Sliced inverse regression Multi-dimensional LDA Partial Least Squares Network Component analysis.

Estimating standard error using bootstrap

Regression and Correlation

Chapter 4 Basic Estimation Techniques

Gauss-Siedel Method.

Perturbation method, lexicographic method

CJT 765: Structural Equation Modeling

Privacy and Fault-Tolerance in Distributed Optimization Nitin Vaidya University of Illinois at Urbana-Champaign.

Alan Girling University of Birmingham, UK

Lecture 22: Parallel Algorithms

Autar Kaw Benjamin Rigsby

Data Mining Practical Machine Learning Tools and Techniques

CHAPTER 29: Multiple Regression*

Unfolding Problem: A Machine Learning Approach

Chapter 6: MULTIPLE REGRESSION ANALYSIS

Chap 3. The simplex method

Exam 5 Review GOVT 201.

Measuring Change in Two-Wave Studies

Chapter 8: Weighting adjustment

Linear Model Selection and regularization

Regression Lecture-5 Additional chapters of mathematics

Artificial Intelligence

OVERVIEW OF LINEAR MODELS

6.1 Introduction to Chi-Square Space

Heteroskedasticity.

The loss function, the normal equation,

Mathematical Foundations of BME Reza Shadmehr

Sparse Principal Component Analysis

Unfolding with system identification

The European Statistical Training Programme (ESTP)

Simplex method (algebraic interpretation)

Markov Decision Processes

Markov Decision Processes

Presentation transcript:

Alan Girling University of Birmingham, UK A.J.Girling@bham.ac.uk An algorithm to optimise the sampling scheme within the clusters of a stepped-wedge design Alan Girling University of Birmingham, UK A.J.Girling@bham.ac.uk Funding support (AG) from the NIHR through: The NIHR Collaborations for Leadership in Applied Health Research and Care for West Midlands (CLAHRC WM). The HiSLAC study (NIHR ref 12/128/07) London November 2018

Scope Cross-sectional cluster designs: with (possibly) time-varying cluster-level effects, and uni-directional switching between treatment regimes (as in Stepped-Wedge) Equal numbers of observations in each cluster Freedom to choose the timing of observations within each cluster *Other constraints are available! *

Treatment Effect Estimate: 𝜃 = 𝑘,𝑡 𝑎 𝑘𝑡 𝑦 𝑘𝑡 SW4: 20 observations per cluster at each time-point ICC = 0.0099; fixed time effects (Hussey & Hughes model) Cell Mean Treatment Effect Estimate: 𝜃 = 𝑘,𝑡 𝑎 𝑘𝑡 𝑦 𝑘𝑡 Design Layout No.s of Observations 𝑚 𝑘𝑡 Total (𝑀) Clusters (k) 20 100 Time (t) Coefficients: 𝑎 𝑘𝑡 (×100) 7.5 30 17.5 5 2.5 15 22.5 10 2.5 10 22.5 15 7.5 5 17.5 30 Precision = var 𝜃 −1  0.400

Proposal: modify the 𝑚 𝑘𝑡 s to make 𝑚 𝑘𝑡 ∗ ∝ 𝑎 𝑘𝑡 within each row 𝑎 𝑘𝑡 (×100) ∑|𝑎| 7.5 30 17.5 5 67.5 2.5 15 22.5 10 52.5 2.5 10 22.5 15 7.5 5 17.5 30 𝜃 = 𝑘,𝑡 𝑎 𝑘𝑡 𝑦 𝑘𝑡 Some observations have greater influence on the estimate than others (unlike many classical designs) ?Layout might be improved by moving observations from low-influence to high-influence cells within the same cluster (For equal influence, need 𝑎 𝑘𝑡 𝑚 𝑘𝑡 to be the same in each cell) Proposal: modify the 𝑚 𝑘𝑡 s to make 𝑚 𝑘𝑡 ∗ ∝ 𝑎 𝑘𝑡 within each row 𝑚 𝑘𝑡 ∗ =𝑀 𝑎 𝑘𝑡 𝑠=1 𝑇 𝑎 𝑘𝑠

Revised Layout: 𝑚 𝑘𝑡 ∗ =𝑀 𝑎 𝑘𝑡 𝑠=1 𝑇 𝑎 𝑘𝑠 . Also Update Treatment Estimate No.s of Observations ( 𝑚 𝑘𝑡 ∗ ) Total (𝑀) Clusters (k) 11.1 44.4 25.9 7.4 100 4.8 28.6 42.9 19.0 Time (t) New Coefficients 𝑎 𝑘𝑡 ∗ (×100) ∑ 𝑎 ∗ 3.9 30.1 11.6 1.6 51.1 0.7 20.4 28.1 8.1 58.0 0.7 8.1 28.1 20.4 3.9 1.6 11.6 30.1 Precision  0.624 Precision has improved from 0.400 to 0.624 But, 𝑎 𝑘𝑡 ∗ 𝑚 𝑘𝑡 ∗ ≠𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡, still, within each row

No.s of Observations ( 𝑚 𝑘𝑡 (∞) ) After repeated iteration the process converges (to a ‘Staircase’ Design) 𝜃 = 𝑘,𝑡 𝑎 𝑘𝑡 ∞ 𝑦 𝑘𝑡 No.s of Observations ( 𝑚 𝑘𝑡 (∞) ) Total Clusters (k) 100 50 Time (t) 𝑎 𝑘𝑡 ∞ ×100 Clusters 33.3 33.3 Time Precision  0.750 𝑎 𝑘𝑡 ∞ 𝑚 𝑘𝑡 ∞ =𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 within rows, at least for occupied cells

The Algorithm For the current allocation 𝑚 𝑘𝑡 𝑛 compute the coefficients 𝑎 𝑘𝑡 𝑛 of the best estimate 𝜃. 𝜃 𝑛 = 𝑘,𝑡 𝑎 𝑘𝑡 𝑛 𝑦 𝑘𝑡 Update the allocation to make 𝑚 𝑘𝑡 𝑛+1 ∝ 𝑎 𝑘𝑡 𝑛 within each cluster using 𝑚 𝑘𝑡 𝑛+1 =𝑀 𝑎 𝑘𝑡 𝑛 𝑠=1 𝑇 𝑎 𝑘𝑠 𝑛 . (𝑀 is the total number in each cluster, assumed fixed.) Repeat ad lib.

Model 𝑦 𝑘𝑡𝑖 = 𝛽 𝑡 + 𝜃 𝑋 𝑘𝑡 + 𝛾 𝑘𝑡 + 𝜀 𝑘𝑡𝑖 … for Observation 𝑖 in Cluster 𝑘 at time 𝑡. 𝑦 𝑘𝑡𝑖 = 𝛽 𝑡 + 𝜃 𝑋 𝑘𝑡 + 𝛾 𝑘𝑡 + 𝜀 𝑘𝑡𝑖 Time Treatment Cluster x Time Residual var 𝛾 𝑘𝑡 = 𝜏 2 , var 𝜀 𝑘𝑡𝑖 = 𝜎 2 , corr 𝛾 𝑘𝑠 , 𝛾 𝑘𝑡 = Γ 𝑠𝑡 (Hussey & Hughes) Γ 𝑠𝑡 ≡1 (Exchangeable) Γ 𝑠𝑡 =𝜋+ 1−𝜋 𝛿 𝑠𝑡 (Exponential) Γ 𝑠𝑡 = 𝑟 𝑠−𝑡 Fixed Effects Random Effects 𝜃 = 𝑘,𝑡 𝑎 𝑘𝑡 𝑦 𝑘𝑡 is the weighted least squares estimator ( BLUE)

Properties of the Algorithm Improvement happens at every step: I.e. var 𝜃 𝑛+1 ≤ var 𝜃 𝑛 with equality only if 𝑚 𝑘𝑡 𝑛 ∝ 𝑎 𝑘𝑡 𝑛 within each cluster. Convergence to a stable point is guaranteed This is usually the optimal allocation Any stable point is a ‘best’ allocation among all allocations with that support (i.e. collection of non-zero cells) But, if an empty cell appears at any step i.e. 𝑎 𝑘𝑡 𝑛 =0 that cell remains empty at every subsequent step. (In principle the best allocation could be missed.) On the other hand, this property allows us to obtain improved/optimal designs in situations where sampling in some cells is prohibited Behaviour depends on 𝜎 2 , 𝜏 2 and M only through 𝑅= 𝑀 𝜏 2 𝑀 𝜏 2 + 𝜎 2 𝑅 is related to the Cluster-Mean Correlation 𝐶𝑀𝐶 1−𝐶𝑀𝐶 = 1 ′ Γ1 𝑇 2 ⋅ 𝑅 1−𝑅

Examples: 1) Hussey & Hughes model Initial Allocation: Equal % of observations at each time-point. (Row-totals = 100%) 14 When 𝑅< 1 2 solution is NOT Unique 𝑅=0.25 Unique solution when 𝑅≥ 1 2 , apart from trades between end columns 𝑅=0.75 79 18 3 50 39 9 2 8 42 40 1 17 67 47 53 49 51 Efficiency improves from 0.50 (initially) to 0.92 Efficiency improves from 0.38 (initially) to 0.76 (Efficiency computed relative to a Cluster Cross-Over design with same number of observations.)

Efficiency of Optimised Allocation: Hussey & Hughes Model

Examples: 2) Exponential model: r = 0.9 Initial Allocation: Equal % of observations at each time-point. (Row-totals = 100%) 14 Exact General Behaviour unknown 𝑅=0.25 𝑅=0.75 72 3 4 8 14 47 53 49 51 14 73 46 54 49 51 Efficiency improves from 0.50 (initially) to 0.91 Efficiency improves from 0.35 (initially) to 0.66 (Efficiency relative to an “Ideal” Cluster Cross-Over design with same number of observations.)

Efficiency of Optimised Allocation: Exponential Model with r = 0.9

Example with prohibited cells: ‘Transition’/ ‘Washout’ periods (under H&H model) Initial Allocation: Equal % of observations at each permissible time-point. (Row-totals = 100%) 14 𝑅=0.25 𝑅=0.75 78 16 5 72 25 2 50 44 6 12 67 9 8 13 4 50 Efficiency improves from 0.34 (initially) to 0.83 Efficiency improves from 0.26 (initially) to 0.53

Why it works For any linear estimate 𝜃 =∑ 𝑎 𝑘𝑡 𝑦 𝑘𝑡 , var 𝜃 =𝑉(𝑎,𝑚)= 𝜏 2 𝑘=1 𝐾 𝑎 𝑘 ′ Γ 𝑎 𝑘 + 𝜎 2 𝑘=1 𝐾 𝑡=1 𝑇 𝑎 𝑘𝑡 2 𝑚 𝑘𝑡 It is always true that 𝑡=1 𝑇 𝑎 𝑘𝑡 2 𝑚 𝑘𝑡 ∗ ≤ 𝑡=1 𝑇 𝑎 𝑘𝑡 2 𝑚 𝑘𝑡 where 𝑚 𝑘𝑡 ∗ =𝑀 𝑎 𝑘𝑡 𝑠=1 𝑇 𝑎 𝑘𝑠 So the variance of the estimate is reduced by the reallocation of observations. Now apply this argument to the BLUE of 𝜃 under allocation 𝑚 𝑘𝑡 . It follows that the BLUE of 𝜃 under allocation ( 𝑚 𝑘𝑡 ∗ ) has smaller variance than the BLUE under 𝑚 𝑘𝑡 .

Spin-off: An Objective Function The best allocation corresponds to a stable point of the algorithm. At any stable point (i.e. where 𝑚 𝑘𝑡 ∝| 𝑎 𝑘𝑡 |within each cluster): 𝑉 𝑎,𝑚 ∝Ψ 𝑎 =𝑅 𝑘=1 𝐾 𝑎 𝑘 ′ Γ 𝑎 𝑘 +(1−𝑅) 𝑘=1 𝐾 𝑡=1 𝑇 𝑎 𝑘𝑡 2 Any optimal design corresponds to a (constrained) minimum value of Ψ. Ψ 𝑎 = min 𝑎 Ψ(𝑎) (subject to unbiasedness constraints on the 𝑎 𝑘𝑡 s, and 𝑎 𝑘𝑡 =0 in any prohibited cells) …with cell numbers given by: 𝑚 𝑘𝑡 =𝑀 𝑎 𝑘𝑡 𝑠=1 𝑇 𝑎 𝑘𝑠 Ψ 𝑎 is not a smooth function, but it is convex.

Potential for Exact results using Ψ 𝑎 Eg. For the Hussey and Hughes Model (Γ 𝑠𝑡 ≡1) Ψ 𝑎 =𝑅 𝑘=1 𝐾 𝑡=1 𝑇 𝑎 𝑘𝑡 2 + 1−𝑅 𝑘=1 𝐾 𝑡=1 𝑇 𝑎 𝑘𝑡 2

(Exact) Optimal Design under HH: R≥ 1 2 The matrix of 𝑎 𝑘𝑡 s has an “Anchored Staircase” form: q0/2 q1 q1 q2 q2 q3 q3 q4 q4 q5 +q6/2 q5 𝑞 𝑘 ∝ coth 𝜙 2 ∙ sinh 𝐾𝜙 2 − cosh 𝑘− 𝐾 2 𝜙 cosh 𝜙= 2𝑅−1 −1 E.g. 𝑅=0.75; Efficiency ≈0.76 Efficiency=1− 1 𝐾 2− tanh 𝜙 2 tanh 𝐾𝜙 2 =1− 1 𝐾 2− 1−𝑅 𝑅 +𝑂 1 𝐾 2 17 67 47 53 49 51 (Rel. to CXO)

Optimal Design under HH: R< 1 2 One possible matrix of 𝑎 𝑘𝑡 s is: x+y y x x y xy 𝑥= 𝐾−2𝑅 −1 , 𝑦= 1 2 −𝑅 ⋅ 𝐾−2𝑅 −1 Eg. 𝑅=0.25 Efficiency = 11 12 ≈0.92 Efficiency=1− 2𝑅 𝐾 83 17 50

An alternative solution was given earlier: 𝑅=0.25; Efficiency = 0.92 79 18 3 50 39 9 2 8 42 40 1 …and there are many others.

Summary Flexible approach to improving design, often leading to substantial improvements in precision Works for sparse layouts and designs with prohibited cells Where the solution is a staircase-type design the experiment may take longer Partly this is a consequence of improved precision A fair comparison is between designs with the same precision (i.e. SW vs an optimised design with fewer total observations). The objective function Ψ provides an alternative approach via convex optimisation methods, and a tool for finding exact results

Further developments Optimal allocation of clusters to (optimised) sequences Readily accomplished by adding an extra computation to the algorithm Little advantage for precision, it seems, but there may be scope for alternative near-optimal designs Alternative constraints Fixed total size of study Constraints over specific time-periods Unequal clusters Explore optimal designs with prohibited cells Eg. the Washout example Or to seek more compact designs