Presentation on theme: "Safety Data Mining: Background and Current Issues Ramin Arani, PhD Safety Data Mining Global Biometric Science Bristol-Myers Squibb Company SAMSI: July,"— Presentation transcript:
Safety Data Mining: Background and Current Issues Ramin Arani, PhD Safety Data Mining Global Biometric Science Bristol-Myers Squibb Company SAMSI: July, 2006
Outline Rationale for Pharmacovigilance Rationale for Pharmacovigilance AERS Data Base AERS Data Base Data base issues Data base issues Methodologies Methodologies BCNN (WHO) BCNN (WHO) MGPS (FDA) MGPS (FDA) Summary Summary Challenges and Opportunities Challenges and Opportunities
Pharmacovigilance - Rationale Information obtained prior to first marketing is inadequate to cover all aspects of drug safety: tests in animals are insufficiently predictive of human safety, tests in animals are insufficiently predictive of human safety, in clinical trials patients are selected and limited in number, in clinical trials patients are selected and limited in number, conditions of use in trials differ from those in clinical practice, conditions of use in trials differ from those in clinical practice, duration of trials is limited duration of trials is limited information about rare but serious adverse reactions, chronic toxicity, use in special groups or drug interactions is often not available. information about rare but serious adverse reactions, chronic toxicity, use in special groups or drug interactions is often not available.
Pre Approval Data - Controlled - Limited # Pts - Safety data not mature Post Approval Data - Real life ; uncontrolled - Off label use -Generic - Solicited Safety Data - Unsolicited Safety Data Population Subjects for approval Pharmacovigilance - Rationale
Spontaneous AE Reports u uSafety information from clinical trials is incomplete ° °Few patients -- rare events likely to be missed ° °Not necessarily real world u uNeed info from post-marketing surveillance & spontaneous reports u uPharmacovigilance by reg. agencies & mfrs carried out. u uLong history of research on issue ° °Finney (MIMed1974, SM1982) Royall (Bcs1971) ° °Inman (BMedBull1970)Napke (CanPhJ1970)
Issues u uIncomplete reports of events, not necessarily reactions u uHow to compute effect magnitude u uMany events reported, many drugs reported u uBias & noise in system u uDifficult to estimate incidence because no. of pats at risk, duration of exposure seldom reliable u uAppropriate use of computerized methods, e.g., supplementing standard pharmacovigilance to identify possible signals sooner -- early warning signal
Safety Signal: Reported information on a possible causal relationship between an adverse event and a drug. Pharmacovigilance - Definition Phamacovigilance Set of methods that aim at identifying and quantitatively assess the risks related to the use of drugs in the entire population, or in specific population subgroups Adverse Drug Reaction A response to a drug which is harmful and unintended, and which occurs at doses normally used.
AERS Database Database Origin 1969 Database Origin 1969 SRS until 11/1/97; changed to AERS SRS until 11/1/97; changed to AERS 3.0 million reports in database 3.0 million reports in database All SRS data migrated into AERS All SRS data migrated into AERS Contains Drug and "Therapeutic" Biologic Reports Contains Drug and "Therapeutic" Biologic Reports exception = vaccines (VAERS) exception = vaccines (VAERS)
Source of AERS Reports Health Professionals, Consumers / Patients Voluntary : Direct to FDA and/or to Manufacturer Manufacturers: Regulations for Postmarketing Reporting
AERS Limitations Different populations, Co-morbidities, Co-prescribing, Off-label use, Rare events Different populations, Co-morbidities, Co-prescribing, Off-label use, Rare events Report volume for a drug is affected by, volume of use, publicity, type and severity of the event and other factors, therefore the reporting rate is not a true measure of the rate or the risk Report volume for a drug is affected by, volume of use, publicity, type and severity of the event and other factors, therefore the reporting rate is not a true measure of the rate or the risk An observed event may be due to the indication for therapy rather than the therapy itself; therefore observed associations should be viewed as signal, and causal conclusions drawn with caution An observed event may be due to the indication for therapy rather than the therapy itself; therefore observed associations should be viewed as signal, and causal conclusions drawn with caution
Examples Claritin and arrhythmias (channeling and need for detailed data not in data base) Increased number of reports due to preexisting condition. Selection of high risk patients for the drug deemed safest for them. Prozac and suicide (confounding by indication) Large increase in reports following publicity and stimulated reporting
The Pharmacovigilance Process Detect Signals Traditional Methods Data Mining Generate Hypotheses Refute/Verify Type A (Mechanism-based) Type B (Idiosyncratic) Insight from Outliers Estimate Incidence Public Health Impact, Benefit/Risk Act Inform Change Label Restrict use/ withdraw
Finding Interestingly Large Cell Counts in a Massive Frequency Table Rows and Columns May Have Thousands of Categories Rows and Columns May Have Thousands of Categories Most Cells Are Empty, even though N ++ Is very Large Most Cells Are Empty, even though N ++ Is very Large Only 386K out of 1331K Cells Have N ij > 0 Only 386K out of 1331K Cells Have N ij > 0 174 Drug-Event Combinations Have N ij > 1000 174 Drug-Event Combinations Have N ij > 1000 No. Reports AE 1 … AE n Total Drug 1 N 11 … N 1n N 1+ :: N ij :: Drug m Nm1Nm1Nm1Nm1… N mn N m+ Total N +1 … N +n N ++
Method - Basics Method - Basics Endpoint: No of AEs Most use variations of 2-way table statistics No. Reports Target AE Other AE Total Target Drug aba+b Other Drug cdc+d Totala+cb+dn Some possibilities Reporting Ratio: E(a) = (a+b) (a+c)/n Proportional Reporting Ratio: E(a) = (a+b) c / (c+d) Odds Ratio: E(a) = b c / d OR > PRR > RR when a > E(a) Basic idea: Flag when R = a/E(a) is large
Bayesian Approaches u uTwo current approaches: DuMouchel & WHO u uBoth use ratio n ij / E ij where n ij = no. of reports mentioning both drug i & event j E ij = expected no. of reports of drug i & event j u uBoth report features of posterior distn of information criterion IC ij = log 2 n ij / E ij = PRR ij u uE ij usually computed assuming drug i & event j are mentioned independently u uRatio > 1 (IC > 0) combination mentioned more often than expected if independent
WHO (Bate et al, EurJClPhrm1998) u u Bayesian Confidence Neural Network (BCNN) Model: u un ij = no. reports mentioning both drug i & event j u un i+ = no. reports mentioning drug i u un +j = no. reports mentioning event j Usual Bayesian inferential setup: u uBinomial likelihoods for n ij, n i+, n +j u uBeta priors for the rate parameters (r ij, p i, q j )
WHO, contd u uUses delta method to approximate variance of Q ij = ln r ij / p i q j = ln 2 IC ij u uHowever, can calculate exact mean and variance of Q ij u uWHO measure of importance = E(ICij) - 2 SD(ICij) u uTest of signal detection predictive value by analysis of signals 1993- 2000: Drug Safety 2000; 23:533-542 u u84% Negative Pred Val, 44% Positive Pred Val u uGood filtering strategy for clinical assessment
Let A denote adverse events and D denote the drug. Mutual information I(A,D) is a measure of association WHO, contd
DuMouchel (AmStat1999) u uE ij known, computed using stratification of database -- n i+ (k) = no. reports of drug i in stratum k n +j (k) = no. reports of event j in stratum k N (k) = total reports in stratum k E ij = k n i+ (k) n +j (k) / N (k) (E (n ij ) under independence) u un ij ~ Poisson( ij ) -- interested in ij = ij /E ij u uPrior distn for = mixture of gamma distns: f( ; a 1, b 1, a 2, b 2, ) = g( ; a 1, b 1 ) + (1 – ) g( ; a 2, b 2 ) where g( ; a, b) = b (b ) a – 1 e -b / (a)
DuMouchel, contd u uEstimate, a 1, b 1, a 2, b 2 using Empirical Bayes -- marginal distn of n ij is mixture of negative binomials u uPosterior density of ij also is mixture of gammas u uln 2 ij = IC ij u uEasy to get 5% lower bound (i.e. E(IC ij ) - 2 SD(IC ij ) )
The control group and the issue of compared to what? u uSignal strategies, compare u u a drug with itself from prior time periods u uwith other drugs and events u uwith external data sources of relative drug usage and exposure u uTotal frequency count for a drug is used as a relative surrogate for external denominator of exposure; for ease of use, quick and efficient; u uAnalogy to case-control design where cases are specific AE term, controls are other terms, and outcomes are presence or absence of exposure to a specific drug.
Other useful metrics and methods u uChi-square statistics u uP-value type metric- overly influenced by sample size u uModeling association through directly Multivariate Poisson dist u uIncorporation of a prior distribution on some drugs and/or events for which previous information is available - e.g. Liver events or pre-market signals
Interpreting the Signal Through the Role of Visual Graphics u uFour examples of spatial maps that reduce the scores to patterns and user friendly graphs and help to interpret many signals collectively
Example 1 A spatial map showing the signal scores for the most frequently reported events (rows) and drugs (columns) in the database by the intensity of the empirical Bayes signal score (blue color is a stronger signal than purple)
Example 2 Spatial map showing fingerprints of signal scores allowing one to visually compare the complexity of patterns for different drugs and events and to identify positive or negative co-occurrences
Example 3 Cumulative scores and numbers of reports according to the year when the signal was first detected for selected drugs
Example 4 Differences in paired male-female signal scores for a specific adverse event across drugs with events reported (red means females greater, green means males greater)
Summary 1. 1.There is NO Golden Standard method for signal detection. 2. 2.The signals become more stable over time, however there is a limited time window of opportunity for signal detection. 3. 3.Use Time-slice evolution of signal. -Fluctuation might reveal external risk factors. -Robustness can be assessed. 4. 4.Consider other endpoint such as time to onset, duration of event, etc. 5. 5.For spontaneous case reports, the means to improve content is to standardize and improve intake 6. 6.Data mining likely will generate many false positives and affirmations of what was previously known 7. 7.Causality assessments should largely be reserved refining important signals
Challenges in the future More real time data analysis More interactivity ( Visual Data mining, e.g. ggobi ) Linkage with other data bases to control the bias inherent in data base Quality control strategies (e.g. Identifying duplicates Methods to reduce the false positive and negative?