Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab.

Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab

Nijman Lab

Working on specialised target-oriented cancer therapies Cancer = cell mutation Drug Mutation Drug

Motivation Testing various drugs on various mutated cells 100 drugs vs 100 mutations = 10.000 interactions Analyse the generated data to find new treatments

Overview Background –Biological Background –Technical Procedure –Initial State –Special Aspects –Previous Approach Analysis –Explorative Data Analysis –Drug Noisiness Data generation

Overview Hit detection –Statistical Methods –Filtering Methods –The Algorithm –Evaluation of the result

Biological Background Idea behind cancer treatment –Kill cancer cells while leaving normal cells alive Common chemotherapies –Kill cells with higher division rate –Problem: moth-, throat-, bowel-mucosa and hair cells –Feel sick, loosing hair etc.

Biological Background Synthetic lethality approach –Some biochemical process which are necessary for cell growth are redundant –e.g. DNA repair –Biochemical processes are chained = “protein pathway”

Protein pathways Protein A Protein B Protein C Cell growth Drug Gene

Synthetic lethality Choose a cancer which has a mutation of a gene in one of that pathways Find a drug which inhibits the other pathway

Synthetic lethality Produce cells with mutations which are normally present in cancer Find drug Possible that this will work in real cancer –Tumours have more than one mutation  can influence each other

Technical Procedure Standard dataset consists of 38.400 interactions 96 drugs x 100 mutations x 4 Testing would be inefficient

Technical Procedure Idea: Testing different cell lines in one well  384 wells

Before the experiment

After the experiment Copy the barcodes of the cells by a polymerase chain reaction (PCR)  amplifies the signal Adding a vitamin to the barcode which can stick on a dye-containing protein Amount of barcode correlates with the amount of remaining cells

After the experiment

Allocation Red and infrared emitted light  barcode  mutation Green reflected light  cell amount –Arbitrary unit which correlates with the cell amount –Called “Reporter” Drug  because of the used well

Initial state Because drugs are dissolved in a dilution, we can use wells without drugs  use as control

Back to statistics....

Special Aspects Biological and technical factors cause noisy and not directly usable data  Inter- and intraindividual variability

Interindividual Variability Variability between observation units Cells with the same mutation = one observation unit = “one virtual cancer patient” Variation among different mutated cells Reasons –Mutations can be toxic itself –Characteristics of the technical process

Interindividual Variability Average amount of remaining mutations

Variability of Technical Procedure Limited precision –Precision of drug dosing –Precision of cell amount –Quality of the measurement equipment Decreased sensitivity to a lower signal –Detection limit –Killed cells don’t get a zero signal  background noise with different variability

Variability of Technical Procedure Amplification problems –Copying the barcodes by PCR needs material –If some cell lines are completely killed  more material for other cell lines  higher amplification of survived cells

Amplification Problems

Previous Approach Visual method, based on scatter plots Identify outliers visually

Previous Approach 1.Calculating the effect 1.Median normalization of drugs 2.Calculate a relative ratio

Plotting the ratio against the median of a mutation Previous Approach

There are some problems.... If two lines overlap, hits can be obscured No comparable value that estimates the significance of outliers Intraindividual variability referred to replicates is ignored Human errors  outlier-detection is subjective Slow, not automatable method

Overview Background –Biological Background –Technical Procedure –Initial State –Special Aspects –Previous Approach Analysis –Explorative Data Analysis –Drug Noisiness

Explorative Data Analysis Necessary for hit detection Analysis of the behaviour of the data Closer look at –Distribution of mutations –Variability of mutations and replicates –Skewness of mutations –Noisiness of Drugs

Distribution of Mutations Choosing the right statistical test Test will be applied on mutations to see which drug works best Effect is point of interest  Matrix of relative ratios

Variability of Mutations Decreased sensitivity to lower signal Maybe a detection limit Spread vs Level plot

Replicate Variability Important factor is the multiple testing of cells by the same drugs. Indicator for accurateness and reproducibility of the technical procedure.

Skewness of Mutations Another indicator for different behaviour below the threshold Right skewed distributions because of background noise in lower signal

Drug Noisiness Nothing to do with background noise Caused by technical procedure –Overdosing of cells or drugs –Toxicity (“Dosis facit venenum“) Different effect –Strong resistance –Strong sensitivity

Amplification Problems

Strong Noisiness Easy to identify Dedicated outliers High amount of false positive hits Idea: Noisiness causes weak correlation to the control

Weak Noisiness Also numerous differences in sensitivity or resistance Contrast to normal drugs is not well defined Visual methods failed Also a lot of false positive hits

Strong Noisiness vs Weak Noisiness

Overview Hit detection –Statistical Methods –Filtering Methods –The Algorithm –Evaluation of the result

Hit detection Definition of a Hit –Indicate synthetic lethality –Resistance is also interesting from a biological point of view –Not noisy 2 Stages: 1.Finding potential hits 2.Filtering false-positive hits and incomparable data

Statistical Test Mutations not normally distributed Compare the 4 replicates to their mutation Mann-Whithney u-test –Compares two medians –Needs approximately identical distribution form of random variables X and Y –No symmetry or normal distribution needed

Statistical Test Disadvantages –Rank-sum tests are based on the order, not on the magnitudes –Weak outlying interactions get the same p-values as strong outliers –P-values are not interindividual comparable, but the significance is an indicator for it. –Strong noisy drugs are usually extreme outliers  reduce the significance

Multiple testing Multiple testing of interactions against their mutations Increases the error 100 different interactions =

Multiple testing Bonferroni correction needed How to achieve significant results? –Calculate the median of replicates –Testing just the upper and lower 10% of the data

Filtering Drugs Filtering strong noisy drugs by correlation coefficient Filter before the test to increase the significance Note: Drugs shouldn’t be filtered automatically, just identified. If drugs are toxic or not is the decision of a biologist

Filtering strong noisy drugs

Filtering weak noisy drugs Much harder to identify Idea: Weak noisy drugs producing many false- positive hits with high significance –Calculating p-value –Order by significance –Frequency of drugs in the top hits is an indicator for weak noisiness

Top Drugs

Filter Mutations Filter data below a detection limit Ideas Filter by threshold: 30% of the data  just one dataset  no universal validity of the threshold about 250 Filter by skewness: 17% of the data Filter by variationcoefficient 12%

Threshold Estimation Idea: Modification of skewness filter method Outliers of skewness are below the threshold Last non-outlier above the skewness outliers are normal data Threshold should be approximately in the middle of these points

The Algorithm R-Demo

Results

Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab.

Similar presentations

Presentation on theme: "Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab.

Similar presentations

Presentation on theme: "Analysis of Drug-Gene Interaction Data Florian Ganglberger Sebastian Nijman Lab."— Presentation transcript:

Similar presentations

About project

Feedback