6 Linear model (4 concentrations) Y = a + β log (C)Parallel when the slopes β equalNB the range - which concentrations?Do we care about the asymptotes?
7 Four parameter logistic model 4PL:Y = γ +(δ - γ) / [1 + exp (β log (C) – α)]Parallel when asymptotes γ , δ slope βequalMention symmetry.A looks the sameB does not look the same
8 Five parameter logistic model 5PL:Y = γ +(δ - γ) / [1 + exp (β log (C) – α) ]φParallel when asymptotes γ , δ slope βasymmetry φequalA: sameB: not the same.Slope not the same
9 Tests for parallelism Approach 1 Is there evidence that the reference and test curves ARE NOT parallel?Compare unrestricted vs restricted modelsTest loss of fit when model restricted to parallel‘p value’ approachesTraditional F test approach as preferred by European PharmacopoeiaChi-squared test approach as recommended by Gottschalk & Dunn (2005)
10 Approach 2 Pharmacopoeial disharmony exists!! (existed?) Is there evidence that the reference and test curves ARE parallel?Equivalence test approach as recommended in the draft USP guidance (Hauck et al 2005)Fit model allowing non-parallel curvesConfidence intervals on differences between parametersPharmacopoeial disharmony exists!! (existed?)
11 In practice... Four example data sets Data set 1: 60 RP assays (96 well plates, OD: continuous)Data set 2: 15 RP assays (96 well plates, OD : continuous)Data set 3: 12 RP assays (96 well plates, OD : continuous)Data set 4: 60 RP assays (in vivo, survival at day x: binary*)* treated as such for this purpose; wasteful of data1111
12 In practice...We have applied the proposed methods in the context of individual assay ‘pass/fail’ (suitability):Data set 1Compare 2 ‘significance’ approachesCompare ‘equivalence’ with ‘significance’Data sets 2, 3Data set 4Compare ‘F test’ (EP) with ‘equivalence’ (USP)12
13 Data set 1 60 RP assays 8 dilutions 2 independent wells per dilution 4PL a good fit(vs 5PL)NB precisionModel log e OD s log e conc AVERAGE SLOPE = 1/.384 = 2.6GD_RegressionGraph_4PL_WEIGHTED_077wmf1313
14 Data set 1: F test and chi-squared test F test: straightforwardChi-squared test:need to establish mean-variance relationshipThis is a data driven method!!! Very arbitraryEstablishing equivalence limitsHauck paper: provisional capability based limits can be set using reference vs reference assaysNot available in our dataset...
15 Data set 1: F test and chi-squared test 12/60 = 20% of assays have p < 0.05Evidence of dissimilarity?– OR –Precise assay?Chi-squared test:58/60 = 97% of assays have p < 0.05!Intra-assay variability is low differences between parallel and non-parallel model are exaggeratedHistograms of F-test p-values and G&D p-valuesFollowed by example graph to illustrate why G&D behaves so poorly:Intra-assay is variability is low, compared to quality of the fit, differences between curves exaggerated. Poor choice of statisticThis is a data driven method!!! Very arbitraryEstablishing equivalence limitsHauck paper: provisional capability based limits can be set using reference vs reference assaysNot available in our dataset...
16 Data set 1: Comparison of approaches to parallelism
17 Data set 1: Comparison of approaches to parallelism Some evidence of ‘hook’ in modelResidual SS inflatedNOTE HOOK
18 Data set 1: Comparison of approaches to parallelism Excluding top 2 points because of HOOK Approx 20 /60 passRemodelled : quadratic relationship re fitted
19 Data set 1: F test and chi-squared test RSSparallel = 159RSSnon-parallel = 112RSSp – RSSnp = 47Pr(23>47) < 0.01F test: P = 0.03Example where both fail
20 Data set 1: F test and chi-squared test RSSparallel = 100.2RSSnon-parallel =RSSp – RSSnp =Pr(23>1.2) = 0.75Example where both PASS
21 Data set 1: USP methodology Prove parallel Lower asymptote:
22 Data set 1: USP methodology Upper asymptote:This is interesting: demonstrates that its not enough just to order the data and take the 2nd from the end as your limit. Need to examine it. Check for bias!
23 Data set 1: USP methodology Scale:Scale for reference: (range to 0.416)NB scale = 1/ slope
24 Data set 1: USP methodology Criteria for 90% CI on difference between parameter values:Lower asymptotes: (-0.235, 0.235)Upper asymptotes: (-0.213, 0.213)Scales: (-0.187, 0.187)Applying the criteria:3/60 = 5% of assays fail the parallelism criteriaNo assay fails more than one criterion‘scale ‘ parameter from R parameterisation: allows log RP to be estimated as a1 – a2 (easy variance)
25 Data set 1: Comparison of approaches to parallelism
26 Data set 1: Comparison of approaches to parallelism This plate ‘fails’ all 3 testsUSP: Lower asymptoteFAILS ALL whether or not hook included
27 Data set 1: Comparison of approaches to parallelism Equivalence test: scales not equivalentF test p-value = 0.60Chi-squared test p-value < 0.001F test passes: high variability
28 Data set 2: Comparison of approaches to parallelism Constant variance2828
29 Data set 3: Comparison of approaches to parallelism Linear fit for mean varianceAgain the G&D test suggests more assays “FAIL”2929
30 In practice... Data set 4: Compare ‘F test’ with ‘equivalence’ Methodology for Chi-squared test not developed for binary data30
31 Data set 4 60 RP assays 4 dilutions 15 animals per dilution 31 Actual model is a GLM (i.e. response 0,1 dependent on survival), % Survival shown for illustrative purposes only;.SLOPES: average = range (-14.71, -1.03)3131
32 Data set 4: Comparison of approaches to parallelism F test: 5/60 = 8% failEquivalence: Fail 5% = 3Equivalence: could choose limit to match
33 Data set 4: Comparison of approaches to parallelism F-test approach and Equivalence approach could be in agreement depending on how limits are set.
34 Broadly... F test Chi-squared USP Fail (?wrongly?) when very precise assayPass (?wrongly?) when noisyLinear case: p value can be adjusted to match equivalenceChi-squaredFail when very precise assay (even if difference is small)If model fits badly – weighting inflates RSS (e.g. hook)2 further data sets supported thisUSPLimits are set such that the extreme 5% will failThey do!Regardless of precision, model fit etc3434
35 Stepping back… What are we trying to do? Produce a biologic to a controlled standard that can be used in clinical practiceFor a batch we need to know its potencyWith appropriate precisionIn order to calculate clinical dosePerhaps add more information about precision to this3535
36 Some thoughts Establish a valid assay Use all development assay results unless a physical reason exists to exclude themStatistical methodology can be used to flag possible outliers for investigationUSP <111> applies this to individual data pointsParallelism / similarityAre the parameter differences fundamentally zero?Or is there a consistent slope difference (e.g)?Equivalence approach + judgment for acceptable marginPerhaps add more information about precision to this3636
37 Some thoughts2. Set number of replicates to provide required precisionCombine RP values plus confidence intervals for reportable valuePer assay, use all results unless physical reason not to(They are part of the continuum of assays)Flag for investigation using statistical techniquesReference behaviourParallelism4. Monitor performance over time (SPC)Reference stabilityPerhaps add more information about precision to this3737
38 Which parallelism test? Our view:Chi squared test requires too many complex decisions and is very sensitive to the modelF test not generally applicable to the assay validation stageDoes not allow examination of the individual parametersDoes not lend itself to judgment about ‘How parallel is parallel?’The equivalence test approach fits in all three contextsWith adjustment of the tolerance limits as appropriate38
39 Thank you USP the invitation Clients use of data BioOutsource:Other clients who prefer to remain anonymousQuantics staff analysis and graphicsKelly Fleetwood (R), Catriona Keerie (SAS)39