Presentation is loading. Please wait.

Presentation is loading. Please wait.

Diego Saldana Miranda Applied Technology Innovation Novartis

Similar presentations


Presentation on theme: "Diego Saldana Miranda Applied Technology Innovation Novartis"— Presentation transcript:

1 Diego Saldana Miranda Applied Technology Innovation Novartis
Automated Detection of Adverse Drug Reactions in the Biomedical Literature Using Convolutional Neural Networks and Biomedical Word Embeddings Diego Saldana Miranda Applied Technology Innovation Novartis

2 Presenter Introduction
Presenter: Diego Saldana Miranda Affiliation: Applied Technology Innovation, Novartis. Topics of Interest: Application of Machine Learning to Randomized Control Trial (RCTs) data and Real World Data (RWD) as well as biomedical Natural Language Processing (NLP).

3 Background What is Pharmacovigilance?
Adverse events & drug related problems Detection Assessment Under-standing Prevention Who is involved? (incl. not limited to) Healthcare Professionals Manufacturers Regulatory Authorities Policy Makers, etc Public Patients

4 Background Reporting & Aggregation
Structured Databases FAERS (FDA) Eudra Vigilance (EMA) Adverse Reaction Online Database (MedEffect) Patients Healthcare Professionals Manufacturers Mandatory Reporting Regulatory Authority Structured Database

5 Background Sources of Safety Signals
Safety Signal Sources Clinical Data Biomedical Literature (Case Reports) Social Media Other (telephone, fax, , mail, ...) EMRs/EHRs Omics’

6 Related Work Early Approaches for ADRs
Neural Network Approaches Huynh et al Gupta et al Tafti et al Li et al Ramamoorthy et al Simple Machine Learning Xu et al Cañada et al Sarker et al Lexicons & Gazetteers Xu et al ,3 Gurulingappa et al

7 Dataset The ADE Corpus (Gurulingappa et al. 2012)1
“adverse effects”[sh] AND (hasabstract[text] AND Case Reports[ptyp]) AND “drug therapy”[sh] AND English[lang] AND (Case Reports[ptyp] AND (“1”[PDAT]: “2010/10/07”[PDAT])) ~30000 case reports Annotator 1 2972 case reports Annotator 2 Inter Annotator Agreement (IAA) Annotator 3

8 ADR Detection Sentence Classification
Acute psychosis developed in an elderly patient with Parkinson disease and she was admitted and treated with quetiapine (Seroquel). Non-ADR One day later, high fever unexplained by infection appeared associated with restlessness, confusion, convulsion, leukocytosis, and extreme serum creatine kinase levels. Non-ADR She died of neuroleptic malignant syndrome (NMS) despite intensive treatment. Non-ADR Quetiapine is an atypical neuroleptic agent, rarely associated with NMS in the absence of other contributing drugs. ADR Our case strongly establishes quetiapine-induced NMS (Naranjo scale 6) and is also unique in the abrupt onset and severe refractory course. ADR

9 Dataset The ADE Corpus (Gurulingappa et al. 2012)1
PubMed ID Sentence ADR Drug Unaccountable severe hypercalcemia in a patient treated for hypoparathyroidism with dihydrotachysterol hypercalcemia dihydrotachysterol METHODS: We report two cases of pseudoporphyria caused by naproxen and oxaprozin. pseudoporphyria naproxen oxaprozyn ... duplicate 2620 observations belong in duplicates 1432 duplicated sentences 4272 unique ADR sentences

10 PubMed Central English Wikipedia PubMed
Related Work Word Vectors for Biomedical Applications (Pyysalo et al. 2013)11 word2vec PubMed PubMed Central English Wikipedia Business Use Only

11 Related Work Architectures for Text Classification (Kim 201413, Hughes et al. 201712)
1 convolution layer Sentence classification Hughes et al 4 convolution layers General text classification Business Use Only

12 Key Questions What is the effect of duplication on the performance? What is the true performance? Can biomedical word embeddings improve the performance of the algorithm? How does Huynh’s architecture’s performance compare to more complex architectures’?

13 Methods Huynh’s CNN Architecture (Huynh 2016)
Sentence Length Similar words are closer to each other Dissimilar words are far apart May work better when domain specific Word Vectors Learns useful word combination patterns 5-word windows 300 features Non-linearities Convolution Softmax Analog to logistic regression Decision Function Token embeddings M=300 1D convolution + ReLu P=300 Max Pooling (L) Dropout Vector Representation Coefficients P=300 Sigmoid, Cross Entropy

14 Methods Huynh’s CNN Architecture (Huynh 2016)
Sentence Length Similar words are closer to each other Dissimilar words are far apart May work better when domain specific Word Vectors Learns useful word combination patterns 5 word windows 300 features Non-linearities Convolution Softmax Analog to logistic regression Decision Function Sentence De-duplication Token embeddings Pre-processing M=300 Choice of Embeddings 1D convolution + ReLu Window Size P=300 Activation Function Max Pooling (L) Dropout Vector Representation Number of Features Coefficients P=300 Sigmoid, Cross Entropy

15 Results Impact of De-duplication
No [Avg (±SE)] Yes Accuracy 0.919 (±0.001) 0.914 (±0.002) Precision 0.858 (±0.006) 0.784 (±0.017) Recall 0.860 (±0.005) 0.798 (±0.018) F1-score 0.859 (±0.002) 0.790 (±0.004) Specificity 0.942 (±0.003) 0.943 (±0.006) AUROC 0.966 (±0.001) 0.954 (±0.002) Model With Duplicates Duplicated ADR Sentences Unique ADR Sentences Avg. CV Loss 0.159 0.627

16 Results Impact of Embeddings
Top Gains Within 6 months of pranlukast withdrawal, anemia resolved and urinary sediment and renal function normalized. Combining methylephedrine and Chinese herbal drugs might carry a risk of stroke. Therefore, although garenoxacin reportedly causes fewer adverse reactions for cardiac rhythms than third-generation quinolone antibiotics, one must be cautious of the interference of other drugs during hypokalemia in order to prevent TdP. Diagnosis: practolol induced sclerosing peritonitis. It is concluded that SIADH is an important side effect of lorcainide therapy. De-duplication GloVe [Avg (±SE)] Pyysalo Accuracy 0.914 (±0.002) 0.918 (±0.001) Precision 0.784 (±0.017) 0.800 (±0.010) Recall 0.798 (±0.018) 0.797 (±0.013) F1-score 0.790 (±0.004) 0.798 (±0.004) Specificity 0.943 (±0.006) 0.949 (±0.003) AUROC 0.954 (±0.002) 0.958 (±0.001) Bottom Gains Withdrawal of Depakote resulted in resolution of the effusion. After several unrevealing medical work-ups, he was found to have a high blood lead level (122 microg/dL); he has a history of scraping and sanding lead paint without adequate protective measures. The ulcer did not respond to antibiotic treatment and healed shortly after withholding ATRA. Therefore, although garenoxacin reportedly causes fewer adverse reactions for cardiac rhythms than third-generation quinolone antibiotics, one must be cautious of the interference of other drugs during hypokalemia in order to prevent TdP. In one patient, treatment with DCA was associated with a decrease in blood lactate levels from 11.2 mM before treatment to 0.8 mM 16 h later. This case report describes a 38-year-old male in whom SIADH was strongly suspected secondary to Tegretol therapy to control a seizure disorder.

17 Results Huynh vs Hughes-like Architecture
Top Gains Animals treated with HAL showed a highly significant 32%-46% loss of tyrosine hydroxylase (TH) immunoreactive neurons in the substantia nigra, and 20% contraction of the TH stained dendritic arbour. INTERVENTIONS AND RESULTS: Cardiac complications were observed in five pediatric patients who received between 4.6 and 40.8 mg/kg/d of amphotericin B. The psychotic behavior resolved completely soon after the discontinuation of levetiracetam. Favorable outcome of de novo hepatitis B infection after liver transplantation with lamivudine and adefovir therapy. Cessation of D-Pen and the start of corticosteroid therapy were followed by recovery from bicytopenia. De-duplication Huynh [Avg (±SE)] Hughes-like Accuracy 0.918 (±0.001) 0.905 (±0.003) Precision 0.800 (±0.010) 0.765 (±0.017) Recall 0.797 (±0.013) 0.771 (±0.021) F1-score 0.798 (±0.004) 0.767 (±0.005) Specificity 0.949 (±0.003) 0.939 (±0.008) AUROC 0.958 (±0.001) 0.940 (±0.002) Bottom Gains In this case, discontinuing piroxicam, a nonsteroidal anti-inflammatory drug, and starting a palliative treatment plan helped resolve a patient's ulcers. CONCLUSIONS: The fundus picture shown in these cases may be typical of ASPPC after IVTA injection. Although this G-CSF-driven leucocytosis was alarming it did not appear to have adversely affected the patient's prognosis. OBJECTIVE: Clozapine causes few extrapyramidal symptoms and is recommended as a treatment drug for severe tardive dyskinesia (TD). Of the four patients who responded to HU with an increase in total Hb, all reported symptomatic improvement and three have not required further transfusions.

18 Conclusion The presence of duplicates in the corpus leads to overestimation of performance. Duplicated sentences have a much lower contribution to the loss in cross validation. The use of biomedical word embeddings in place of general purpose GloVe embeddings resulted in better performance for ADR sentence classification with this dataset. Huynh’s CNN architecture performed better than the deeper CNN architecture based on Hughes’.

19 Future Exploration ADR detection Automated Architecture Search
Distant Supervision Visualization / Attentional Mechanisms Transfer Learning Combinations of Multiple Corpora Multi-language Support

20 References [1] Gurulingappa et al. Journal of Biomedical Informatics 45(5): 885–892, [2] Xu et al. Journal of the American Medical Informatics Association 21(1):90–96., [3] Xu et al. Journal of Biomedical Informatics 53:128– [4] Sarker et al. Journal of Biomedical Informatics 53:196– [5] Huynh et al. Proceedings of COLING [6] Li et al. BMC Bioinformatics 18(1) [7] Gupta et al. arXiv: [cs.IR]. [8] Tafti et al. JMIR Medical Informatics 5(4):e51. [9] Ramamoorthy et al. arXiv: v1 [cs.CL], [10] Cañada et al., Nucleic Acids Research, 2017, Vol. 45, Web Server issue. [11] Pyysalo et al., In Proceedings of LBM pages 39–44. [12] Hughes et al., arXiv: [cs.CL], [13] Kim, arXiv: v2 [cs.CL], 2014.

21 Business Use Only


Download ppt "Diego Saldana Miranda Applied Technology Innovation Novartis"

Similar presentations


Ads by Google