Download presentation
Presentation is loading. Please wait.
1
Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab
2
Nijman Lab
3
Working on specialised target-oriented cancer therapies Cancer = cell mutation Drug Mutation Drug
4
Motivation Testing various drugs on various mutated cells 100 drugs vs 100 mutations = 10.000 interactions Analyse the generated data to find new treatments
5
Overview Background –Biological Background –Technical Procedure –Initial State –Special Aspects –Previous Approach Analysis –Explorative Data Analysis –Drug Noisiness Data generation
6
Overview Hit detection –Statistical Methods –Filtering Methods –The Algorithm –Evaluation of the result
7
Biological Background Idea behind cancer treatment –Kill cancer cells while leaving normal cells alive Common chemotherapies –Kill cells with higher division rate –Problem: moth-, throat-, bowel-mucosa and hair cells –Feel sick, loosing hair etc.
8
Biological Background Synthetic lethality approach –Some biochemical process which are necessary for cell growth are redundant –e.g. DNA repair –Biochemical processes are chained = “protein pathway”
9
Protein pathways Protein A Protein B Protein C Cell growth Drug Gene
10
Synthetic lethality Choose a cancer which has a mutation of a gene in one of that pathways Find a drug which inhibits the other pathway
11
Synthetic lethality Produce cells with mutations which are normally present in cancer Find drug Possible that this will work in real cancer –Tumours have more than one mutation can influence each other
12
Technical Procedure Standard dataset consists of 38.400 interactions 96 drugs x 100 mutations x 4 Testing would be inefficient
13
Technical Procedure Idea: Testing different cell lines in one well 384 wells
14
Before the experiment
16
After the experiment Copy the barcodes of the cells by a polymerase chain reaction (PCR) amplifies the signal Adding a vitamin to the barcode which can stick on a dye-containing protein Amount of barcode correlates with the amount of remaining cells
17
After the experiment
18
Allocation Red and infrared emitted light barcode mutation Green reflected light cell amount –Arbitrary unit which correlates with the cell amount –Called “Reporter” Drug because of the used well
19
Initial state Because drugs are dissolved in a dilution, we can use wells without drugs use as control
20
Back to statistics....
21
Special Aspects Biological and technical factors cause noisy and not directly usable data Inter- and intraindividual variability
22
Interindividual Variability Variability between observation units Cells with the same mutation = one observation unit = “one virtual cancer patient” Variation among different mutated cells Reasons –Mutations can be toxic itself –Characteristics of the technical process
23
Interindividual Variability Average amount of remaining mutations
24
Variability of Technical Procedure Limited precision –Precision of drug dosing –Precision of cell amount –Quality of the measurement equipment Decreased sensitivity to a lower signal –Detection limit –Killed cells don’t get a zero signal background noise with different variability
25
Variability of Technical Procedure Amplification problems –Copying the barcodes by PCR needs material –If some cell lines are completely killed more material for other cell lines higher amplification of survived cells
26
Amplification Problems
27
Previous Approach Visual method, based on scatter plots Identify outliers visually
28
Previous Approach 1.Calculating the effect 1.Median normalization of drugs 2.Calculate a relative ratio
29
Plotting the ratio against the median of a mutation Previous Approach
33
There are some problems.... If two lines overlap, hits can be obscured No comparable value that estimates the significance of outliers Intraindividual variability referred to replicates is ignored Human errors outlier-detection is subjective Slow, not automatable method
34
Overview Background –Biological Background –Technical Procedure –Initial State –Special Aspects –Previous Approach Analysis –Explorative Data Analysis –Drug Noisiness
35
Explorative Data Analysis Necessary for hit detection Analysis of the behaviour of the data Closer look at –Distribution of mutations –Variability of mutations and replicates –Skewness of mutations –Noisiness of Drugs
36
Distribution of Mutations Choosing the right statistical test Test will be applied on mutations to see which drug works best Effect is point of interest Matrix of relative ratios
38
Variability of Mutations Decreased sensitivity to lower signal Maybe a detection limit Spread vs Level plot
41
Replicate Variability Important factor is the multiple testing of cells by the same drugs. Indicator for accurateness and reproducibility of the technical procedure.
44
Skewness of Mutations Another indicator for different behaviour below the threshold Right skewed distributions because of background noise in lower signal
46
Drug Noisiness Nothing to do with background noise Caused by technical procedure –Overdosing of cells or drugs –Toxicity (“Dosis facit venenum“) Different effect –Strong resistance –Strong sensitivity
47
Amplification Problems
49
Strong Noisiness Easy to identify Dedicated outliers High amount of false positive hits Idea: Noisiness causes weak correlation to the control
52
Weak Noisiness Also numerous differences in sensitivity or resistance Contrast to normal drugs is not well defined Visual methods failed Also a lot of false positive hits
54
Strong Noisiness vs Weak Noisiness
55
Overview Hit detection –Statistical Methods –Filtering Methods –The Algorithm –Evaluation of the result
56
Hit detection Definition of a Hit –Indicate synthetic lethality –Resistance is also interesting from a biological point of view –Not noisy 2 Stages: 1.Finding potential hits 2.Filtering false-positive hits and incomparable data
57
Statistical Test Mutations not normally distributed Compare the 4 replicates to their mutation Mann-Whithney u-test –Compares two medians –Needs approximately identical distribution form of random variables X and Y –No symmetry or normal distribution needed
58
Statistical Test Disadvantages –Rank-sum tests are based on the order, not on the magnitudes –Weak outlying interactions get the same p-values as strong outliers –P-values are not interindividual comparable, but the significance is an indicator for it. –Strong noisy drugs are usually extreme outliers reduce the significance
59
Multiple testing Multiple testing of interactions against their mutations Increases the error 100 different interactions =
60
Multiple testing Bonferroni correction needed How to achieve significant results? –Calculate the median of replicates –Testing just the upper and lower 10% of the data
61
Filtering Drugs Filtering strong noisy drugs by correlation coefficient Filter before the test to increase the significance Note: Drugs shouldn’t be filtered automatically, just identified. If drugs are toxic or not is the decision of a biologist
62
Filtering strong noisy drugs
63
Filtering weak noisy drugs Much harder to identify Idea: Weak noisy drugs producing many false- positive hits with high significance –Calculating p-value –Order by significance –Frequency of drugs in the top hits is an indicator for weak noisiness
64
Top Drugs
65
Filter Mutations Filter data below a detection limit Ideas Filter by threshold: 30% of the data just one dataset no universal validity of the threshold about 250 Filter by skewness: 17% of the data Filter by variationcoefficient 12%
67
Threshold Estimation Idea: Modification of skewness filter method Outliers of skewness are below the threshold Last non-outlier above the skewness outliers are normal data Threshold should be approximately in the middle of these points
69
The Algorithm R-Demo
70
Results
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.