Alan Girling University of Birmingham, UK A.J.Girling@bham.ac.uk An algorithm to optimise the sampling scheme within the clusters of a stepped-wedge design Alan Girling University of Birmingham, UK A.J.Girling@bham.ac.uk Funding support (AG) from the NIHR through: The NIHR Collaborations for Leadership in Applied Health Research and Care for West Midlands (CLAHRC WM). The HiSLAC study (NIHR ref 12/128/07) London November 2018
Scope Cross-sectional cluster designs: with (possibly) time-varying cluster-level effects, and uni-directional switching between treatment regimes (as in Stepped-Wedge) Equal numbers of observations in each cluster Freedom to choose the timing of observations within each cluster *Other constraints are available! *
Treatment Effect Estimate: 𝜃 = 𝑘,𝑡 𝑎 𝑘𝑡 𝑦 𝑘𝑡 SW4: 20 observations per cluster at each time-point ICC = 0.0099; fixed time effects (Hussey & Hughes model) Cell Mean Treatment Effect Estimate: 𝜃 = 𝑘,𝑡 𝑎 𝑘𝑡 𝑦 𝑘𝑡 Design Layout No.s of Observations 𝑚 𝑘𝑡 Total (𝑀) Clusters (k) 20 100 Time (t) Coefficients: 𝑎 𝑘𝑡 (×100) 7.5 30 17.5 5 2.5 15 22.5 10 2.5 10 22.5 15 7.5 5 17.5 30 Precision = var 𝜃 −1 0.400
Proposal: modify the 𝑚 𝑘𝑡 s to make 𝑚 𝑘𝑡 ∗ ∝ 𝑎 𝑘𝑡 within each row 𝑎 𝑘𝑡 (×100) ∑|𝑎| 7.5 30 17.5 5 67.5 2.5 15 22.5 10 52.5 2.5 10 22.5 15 7.5 5 17.5 30 𝜃 = 𝑘,𝑡 𝑎 𝑘𝑡 𝑦 𝑘𝑡 Some observations have greater influence on the estimate than others (unlike many classical designs) ?Layout might be improved by moving observations from low-influence to high-influence cells within the same cluster (For equal influence, need 𝑎 𝑘𝑡 𝑚 𝑘𝑡 to be the same in each cell) Proposal: modify the 𝑚 𝑘𝑡 s to make 𝑚 𝑘𝑡 ∗ ∝ 𝑎 𝑘𝑡 within each row 𝑚 𝑘𝑡 ∗ =𝑀 𝑎 𝑘𝑡 𝑠=1 𝑇 𝑎 𝑘𝑠
Revised Layout: 𝑚 𝑘𝑡 ∗ =𝑀 𝑎 𝑘𝑡 𝑠=1 𝑇 𝑎 𝑘𝑠 . Also Update Treatment Estimate No.s of Observations ( 𝑚 𝑘𝑡 ∗ ) Total (𝑀) Clusters (k) 11.1 44.4 25.9 7.4 100 4.8 28.6 42.9 19.0 Time (t) New Coefficients 𝑎 𝑘𝑡 ∗ (×100) ∑ 𝑎 ∗ 3.9 30.1 11.6 1.6 51.1 0.7 20.4 28.1 8.1 58.0 0.7 8.1 28.1 20.4 3.9 1.6 11.6 30.1 Precision 0.624 Precision has improved from 0.400 to 0.624 But, 𝑎 𝑘𝑡 ∗ 𝑚 𝑘𝑡 ∗ ≠𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡, still, within each row
No.s of Observations ( 𝑚 𝑘𝑡 (∞) ) After repeated iteration the process converges (to a ‘Staircase’ Design) 𝜃 = 𝑘,𝑡 𝑎 𝑘𝑡 ∞ 𝑦 𝑘𝑡 No.s of Observations ( 𝑚 𝑘𝑡 (∞) ) Total Clusters (k) 100 50 Time (t) 𝑎 𝑘𝑡 ∞ ×100 Clusters 33.3 33.3 Time Precision 0.750 𝑎 𝑘𝑡 ∞ 𝑚 𝑘𝑡 ∞ =𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 within rows, at least for occupied cells
The Algorithm For the current allocation 𝑚 𝑘𝑡 𝑛 compute the coefficients 𝑎 𝑘𝑡 𝑛 of the best estimate 𝜃. 𝜃 𝑛 = 𝑘,𝑡 𝑎 𝑘𝑡 𝑛 𝑦 𝑘𝑡 Update the allocation to make 𝑚 𝑘𝑡 𝑛+1 ∝ 𝑎 𝑘𝑡 𝑛 within each cluster using 𝑚 𝑘𝑡 𝑛+1 =𝑀 𝑎 𝑘𝑡 𝑛 𝑠=1 𝑇 𝑎 𝑘𝑠 𝑛 . (𝑀 is the total number in each cluster, assumed fixed.) Repeat ad lib.
Model 𝑦 𝑘𝑡𝑖 = 𝛽 𝑡 + 𝜃 𝑋 𝑘𝑡 + 𝛾 𝑘𝑡 + 𝜀 𝑘𝑡𝑖 … for Observation 𝑖 in Cluster 𝑘 at time 𝑡. 𝑦 𝑘𝑡𝑖 = 𝛽 𝑡 + 𝜃 𝑋 𝑘𝑡 + 𝛾 𝑘𝑡 + 𝜀 𝑘𝑡𝑖 Time Treatment Cluster x Time Residual var 𝛾 𝑘𝑡 = 𝜏 2 , var 𝜀 𝑘𝑡𝑖 = 𝜎 2 , corr 𝛾 𝑘𝑠 , 𝛾 𝑘𝑡 = Γ 𝑠𝑡 (Hussey & Hughes) Γ 𝑠𝑡 ≡1 (Exchangeable) Γ 𝑠𝑡 =𝜋+ 1−𝜋 𝛿 𝑠𝑡 (Exponential) Γ 𝑠𝑡 = 𝑟 𝑠−𝑡 Fixed Effects Random Effects 𝜃 = 𝑘,𝑡 𝑎 𝑘𝑡 𝑦 𝑘𝑡 is the weighted least squares estimator ( BLUE)
Properties of the Algorithm Improvement happens at every step: I.e. var 𝜃 𝑛+1 ≤ var 𝜃 𝑛 with equality only if 𝑚 𝑘𝑡 𝑛 ∝ 𝑎 𝑘𝑡 𝑛 within each cluster. Convergence to a stable point is guaranteed This is usually the optimal allocation Any stable point is a ‘best’ allocation among all allocations with that support (i.e. collection of non-zero cells) But, if an empty cell appears at any step i.e. 𝑎 𝑘𝑡 𝑛 =0 that cell remains empty at every subsequent step. (In principle the best allocation could be missed.) On the other hand, this property allows us to obtain improved/optimal designs in situations where sampling in some cells is prohibited Behaviour depends on 𝜎 2 , 𝜏 2 and M only through 𝑅= 𝑀 𝜏 2 𝑀 𝜏 2 + 𝜎 2 𝑅 is related to the Cluster-Mean Correlation 𝐶𝑀𝐶 1−𝐶𝑀𝐶 = 1 ′ Γ1 𝑇 2 ⋅ 𝑅 1−𝑅
Examples: 1) Hussey & Hughes model Initial Allocation: Equal % of observations at each time-point. (Row-totals = 100%) 14 When 𝑅< 1 2 solution is NOT Unique 𝑅=0.25 Unique solution when 𝑅≥ 1 2 , apart from trades between end columns 𝑅=0.75 79 18 3 50 39 9 2 8 42 40 1 17 67 47 53 49 51 Efficiency improves from 0.50 (initially) to 0.92 Efficiency improves from 0.38 (initially) to 0.76 (Efficiency computed relative to a Cluster Cross-Over design with same number of observations.)
Efficiency of Optimised Allocation: Hussey & Hughes Model
Examples: 2) Exponential model: r = 0.9 Initial Allocation: Equal % of observations at each time-point. (Row-totals = 100%) 14 Exact General Behaviour unknown 𝑅=0.25 𝑅=0.75 72 3 4 8 14 47 53 49 51 14 73 46 54 49 51 Efficiency improves from 0.50 (initially) to 0.91 Efficiency improves from 0.35 (initially) to 0.66 (Efficiency relative to an “Ideal” Cluster Cross-Over design with same number of observations.)
Efficiency of Optimised Allocation: Exponential Model with r = 0.9
Example with prohibited cells: ‘Transition’/ ‘Washout’ periods (under H&H model) Initial Allocation: Equal % of observations at each permissible time-point. (Row-totals = 100%) 14 𝑅=0.25 𝑅=0.75 78 16 5 72 25 2 50 44 6 12 67 9 8 13 4 50 Efficiency improves from 0.34 (initially) to 0.83 Efficiency improves from 0.26 (initially) to 0.53
Why it works For any linear estimate 𝜃 =∑ 𝑎 𝑘𝑡 𝑦 𝑘𝑡 , var 𝜃 =𝑉(𝑎,𝑚)= 𝜏 2 𝑘=1 𝐾 𝑎 𝑘 ′ Γ 𝑎 𝑘 + 𝜎 2 𝑘=1 𝐾 𝑡=1 𝑇 𝑎 𝑘𝑡 2 𝑚 𝑘𝑡 It is always true that 𝑡=1 𝑇 𝑎 𝑘𝑡 2 𝑚 𝑘𝑡 ∗ ≤ 𝑡=1 𝑇 𝑎 𝑘𝑡 2 𝑚 𝑘𝑡 where 𝑚 𝑘𝑡 ∗ =𝑀 𝑎 𝑘𝑡 𝑠=1 𝑇 𝑎 𝑘𝑠 So the variance of the estimate is reduced by the reallocation of observations. Now apply this argument to the BLUE of 𝜃 under allocation 𝑚 𝑘𝑡 . It follows that the BLUE of 𝜃 under allocation ( 𝑚 𝑘𝑡 ∗ ) has smaller variance than the BLUE under 𝑚 𝑘𝑡 .
Spin-off: An Objective Function The best allocation corresponds to a stable point of the algorithm. At any stable point (i.e. where 𝑚 𝑘𝑡 ∝| 𝑎 𝑘𝑡 |within each cluster): 𝑉 𝑎,𝑚 ∝Ψ 𝑎 =𝑅 𝑘=1 𝐾 𝑎 𝑘 ′ Γ 𝑎 𝑘 +(1−𝑅) 𝑘=1 𝐾 𝑡=1 𝑇 𝑎 𝑘𝑡 2 Any optimal design corresponds to a (constrained) minimum value of Ψ. Ψ 𝑎 = min 𝑎 Ψ(𝑎) (subject to unbiasedness constraints on the 𝑎 𝑘𝑡 s, and 𝑎 𝑘𝑡 =0 in any prohibited cells) …with cell numbers given by: 𝑚 𝑘𝑡 =𝑀 𝑎 𝑘𝑡 𝑠=1 𝑇 𝑎 𝑘𝑠 Ψ 𝑎 is not a smooth function, but it is convex.
Potential for Exact results using Ψ 𝑎 Eg. For the Hussey and Hughes Model (Γ 𝑠𝑡 ≡1) Ψ 𝑎 =𝑅 𝑘=1 𝐾 𝑡=1 𝑇 𝑎 𝑘𝑡 2 + 1−𝑅 𝑘=1 𝐾 𝑡=1 𝑇 𝑎 𝑘𝑡 2
(Exact) Optimal Design under HH: R≥ 1 2 The matrix of 𝑎 𝑘𝑡 s has an “Anchored Staircase” form: q0/2 q1 q1 q2 q2 q3 q3 q4 q4 q5 +q6/2 q5 𝑞 𝑘 ∝ coth 𝜙 2 ∙ sinh 𝐾𝜙 2 − cosh 𝑘− 𝐾 2 𝜙 cosh 𝜙= 2𝑅−1 −1 E.g. 𝑅=0.75; Efficiency ≈0.76 Efficiency=1− 1 𝐾 2− tanh 𝜙 2 tanh 𝐾𝜙 2 =1− 1 𝐾 2− 1−𝑅 𝑅 +𝑂 1 𝐾 2 17 67 47 53 49 51 (Rel. to CXO)
Optimal Design under HH: R< 1 2 One possible matrix of 𝑎 𝑘𝑡 s is: x+y y x x y xy 𝑥= 𝐾−2𝑅 −1 , 𝑦= 1 2 −𝑅 ⋅ 𝐾−2𝑅 −1 Eg. 𝑅=0.25 Efficiency = 11 12 ≈0.92 Efficiency=1− 2𝑅 𝐾 83 17 50
An alternative solution was given earlier: 𝑅=0.25; Efficiency = 0.92 79 18 3 50 39 9 2 8 42 40 1 …and there are many others.
Summary Flexible approach to improving design, often leading to substantial improvements in precision Works for sparse layouts and designs with prohibited cells Where the solution is a staircase-type design the experiment may take longer Partly this is a consequence of improved precision A fair comparison is between designs with the same precision (i.e. SW vs an optimised design with fewer total observations). The objective function Ψ provides an alternative approach via convex optimisation methods, and a tool for finding exact results
Further developments Optimal allocation of clusters to (optimised) sequences Readily accomplished by adding an extra computation to the algorithm Little advantage for precision, it seems, but there may be scope for alternative near-optimal designs Alternative constraints Fixed total size of study Constraints over specific time-periods Unequal clusters Explore optimal designs with prohibited cells Eg. the Washout example Or to seek more compact designs