6Linear model (4 concentrations) Y = a + β log (C)Parallel when the slopes β equalNB the range - which concentrations?Do we care about the asymptotes?
7Four parameter logistic model 4PL:Y = γ +(δ - γ) / [1 + exp (β log (C) – α)]Parallel when asymptotes γ , δ slope βequalMention symmetry.A looks the sameB does not look the same
8Five parameter logistic model 5PL:Y = γ +(δ - γ) / [1 + exp (β log (C) – α) ]φParallel when asymptotes γ , δ slope βasymmetry φequalA: sameB: not the same.Slope not the same
9Tests for parallelism Approach 1 Is there evidence that the reference and test curves ARE NOT parallel?Compare unrestricted vs restricted modelsTest loss of fit when model restricted to parallel‘p value’ approachesTraditional F test approach as preferred by European PharmacopoeiaChi-squared test approach as recommended by Gottschalk & Dunn (2005)
10Approach 2 Pharmacopoeial disharmony exists!! (existed?) Is there evidence that the reference and test curves ARE parallel?Equivalence test approach as recommended in the draft USP guidance (Hauck et al 2005)Fit model allowing non-parallel curvesConfidence intervals on differences between parametersPharmacopoeial disharmony exists!! (existed?)
11In practice... Four example data sets Data set 1: 60 RP assays (96 well plates, OD: continuous)Data set 2: 15 RP assays (96 well plates, OD : continuous)Data set 3: 12 RP assays (96 well plates, OD : continuous)Data set 4: 60 RP assays (in vivo, survival at day x: binary*)* treated as such for this purpose; wasteful of data1111
12In practice...We have applied the proposed methods in the context of individual assay ‘pass/fail’ (suitability):Data set 1Compare 2 ‘significance’ approachesCompare ‘equivalence’ with ‘significance’Data sets 2, 3Data set 4Compare ‘F test’ (EP) with ‘equivalence’ (USP)12
13Data set 1 60 RP assays 8 dilutions 2 independent wells per dilution 4PL a good fit(vs 5PL)NB precisionModel log e OD s log e conc AVERAGE SLOPE = 1/.384 = 2.6GD_RegressionGraph_4PL_WEIGHTED_077wmf1313
14Data set 1: F test and chi-squared test F test: straightforwardChi-squared test:need to establish mean-variance relationshipThis is a data driven method!!! Very arbitraryEstablishing equivalence limitsHauck paper: provisional capability based limits can be set using reference vs reference assaysNot available in our dataset...
15Data set 1: F test and chi-squared test 12/60 = 20% of assays have p < 0.05Evidence of dissimilarity?– OR –Precise assay?Chi-squared test:58/60 = 97% of assays have p < 0.05!Intra-assay variability is low differences between parallel and non-parallel model are exaggeratedHistograms of F-test p-values and G&D p-valuesFollowed by example graph to illustrate why G&D behaves so poorly:Intra-assay is variability is low, compared to quality of the fit, differences between curves exaggerated. Poor choice of statisticThis is a data driven method!!! Very arbitraryEstablishing equivalence limitsHauck paper: provisional capability based limits can be set using reference vs reference assaysNot available in our dataset...
16Data set 1: Comparison of approaches to parallelism
17Data set 1: Comparison of approaches to parallelism Some evidence of ‘hook’ in modelResidual SS inflatedNOTE HOOK
18Data set 1: Comparison of approaches to parallelism Excluding top 2 points because of HOOK Approx 20 /60 passRemodelled : quadratic relationship re fitted
19Data set 1: F test and chi-squared test RSSparallel = 159RSSnon-parallel = 112RSSp – RSSnp = 47Pr(23>47) < 0.01F test: P = 0.03Example where both fail
20Data set 1: F test and chi-squared test RSSparallel = 100.2RSSnon-parallel =RSSp – RSSnp =Pr(23>1.2) = 0.75Example where both PASS
21Data set 1: USP methodology Prove parallel Lower asymptote:
22Data set 1: USP methodology Upper asymptote:This is interesting: demonstrates that its not enough just to order the data and take the 2nd from the end as your limit. Need to examine it. Check for bias!
23Data set 1: USP methodology Scale:Scale for reference: (range to 0.416)NB scale = 1/ slope
24Data set 1: USP methodology Criteria for 90% CI on difference between parameter values:Lower asymptotes: (-0.235, 0.235)Upper asymptotes: (-0.213, 0.213)Scales: (-0.187, 0.187)Applying the criteria:3/60 = 5% of assays fail the parallelism criteriaNo assay fails more than one criterion‘scale ‘ parameter from R parameterisation: allows log RP to be estimated as a1 – a2 (easy variance)
25Data set 1: Comparison of approaches to parallelism
26Data set 1: Comparison of approaches to parallelism This plate ‘fails’ all 3 testsUSP: Lower asymptoteFAILS ALL whether or not hook included
27Data set 1: Comparison of approaches to parallelism Equivalence test: scales not equivalentF test p-value = 0.60Chi-squared test p-value < 0.001F test passes: high variability
28Data set 2: Comparison of approaches to parallelism Constant variance2828
29Data set 3: Comparison of approaches to parallelism Linear fit for mean varianceAgain the G&D test suggests more assays “FAIL”2929
30In practice... Data set 4: Compare ‘F test’ with ‘equivalence’ Methodology for Chi-squared test not developed for binary data30
31Data set 4 60 RP assays 4 dilutions 15 animals per dilution 31 Actual model is a GLM (i.e. response 0,1 dependent on survival), % Survival shown for illustrative purposes only;.SLOPES: average = range (-14.71, -1.03)3131
32Data set 4: Comparison of approaches to parallelism F test: 5/60 = 8% failEquivalence: Fail 5% = 3Equivalence: could choose limit to match
33Data set 4: Comparison of approaches to parallelism F-test approach and Equivalence approach could be in agreement depending on how limits are set.
34Broadly... F test Chi-squared USP Fail (?wrongly?) when very precise assayPass (?wrongly?) when noisyLinear case: p value can be adjusted to match equivalenceChi-squaredFail when very precise assay (even if difference is small)If model fits badly – weighting inflates RSS (e.g. hook)2 further data sets supported thisUSPLimits are set such that the extreme 5% will failThey do!Regardless of precision, model fit etc3434
35Stepping back… What are we trying to do? Produce a biologic to a controlled standard that can be used in clinical practiceFor a batch we need to know its potencyWith appropriate precisionIn order to calculate clinical dosePerhaps add more information about precision to this3535
36Some thoughts Establish a valid assay Use all development assay results unless a physical reason exists to exclude themStatistical methodology can be used to flag possible outliers for investigationUSP <111> applies this to individual data pointsParallelism / similarityAre the parameter differences fundamentally zero?Or is there a consistent slope difference (e.g)?Equivalence approach + judgment for acceptable marginPerhaps add more information about precision to this3636
37Some thoughts2. Set number of replicates to provide required precisionCombine RP values plus confidence intervals for reportable valuePer assay, use all results unless physical reason not to(They are part of the continuum of assays)Flag for investigation using statistical techniquesReference behaviourParallelism4. Monitor performance over time (SPC)Reference stabilityPerhaps add more information about precision to this3737
38Which parallelism test? Our view:Chi squared test requires too many complex decisions and is very sensitive to the modelF test not generally applicable to the assay validation stageDoes not allow examination of the individual parametersDoes not lend itself to judgment about ‘How parallel is parallel?’The equivalence test approach fits in all three contextsWith adjustment of the tolerance limits as appropriate38
39Thank you USP the invitation Clients use of data BioOutsource:Other clients who prefer to remain anonymousQuantics staff analysis and graphicsKelly Fleetwood (R), Catriona Keerie (SAS)39