Presentation is loading. Please wait.

Presentation is loading. Please wait.

Choosing where to look next in a mutation sequence space: Choosing where to look next in a mutation sequence space: Active Learning of informative p53.

Similar presentations


Presentation on theme: "Choosing where to look next in a mutation sequence space: Choosing where to look next in a mutation sequence space: Active Learning of informative p53."— Presentation transcript:

1 Choosing where to look next in a mutation sequence space: Choosing where to look next in a mutation sequence space: Active Learning of informative p53 cancer rescue mutants Sam Danziger Institute For Genomics and Bioinformatics Department of Biomedical Engineering University of California, Irvine Rainer Brachmann Department of Medicine Richard Lathrop Department of Computer Science Jue Zeng Department of Medicine University of California, Irvine

2 Outline Overview: Computer Guided Discovery Overview: Computer Guided Discovery Problem: Cancer and p53 Problem: Cancer and p53 Results: Best Active Learning Results: Best Active Learning Next: Future Experiments Next: Future Experiments

3 Computer Guided Discovery Of Active Mutant Proteins Known Mutants Other Possible Mutants Starting Point: A biomedically important protein with some known mutants. Problem: Find novel mutant proteins with an Active phenotype. Naive Solution: Make and test all other possible mutants in the wet lab.

4 Why Use Computers? Spiral Galaxy M101 ~10^9 stars ~10^9 stars. Known Mutants ~10^2 Known Mutants: ~10^2 ~10^11 Assuming up to 5 mutants in 200 residues How Many Mutants are There?: ~10^11

5 A Better Solution: Active Learning Pick the best unknown mutants to know Example M … Example N+4 Example N+3 Example N+2 Example N+1 Unknown Example N … Example 3 Example 2 Example 1 Known Classifier Train the Classifier Choose an Example to Label Training Set Add the New Example To Training Set

6 An Example of Active Learning: Minimum Marginal Hyperplane Mutant 1Mutant 2 Should unknown Mutant 1 or Mutant 2 be added to the training set? Mutant 2 Select Mutant 2 ACTIVE INACTIVE Known Active Known Inactive 1 2 Unknown Mutant 11 Unknown Mutant 22

7 Another Example: Maximum Curiosity Mutant 1Mutant 2 Should Mutant 1 or Mutant 2 be added to the training set? Cross- validator Mutant 1 Training Set + Mutant 1 (Active) Mutant 1 Training Set + Mutant 1 (Inactive) Training Set Change in correlation coefficient Cross- validator Mutant 2 Training Set + Mutant 2 (Active) Mutant 2 Training Set + Mutant 2 (Inactive) Mutant 1 Select Mutant 1

8 A Third Example: Entropic Tradeoff Known Active Known Inactive Unclassified ACTIVE INACTIVE OK OK OK Selected UnclassifiedOK

9 Which is the Best Active Learning Method? TYPE I: Select mutants that most improve the classifier if correctly predicted. Maximum Curiosity Maximum Curiosity Composite Classifier Composite Classifier Improved Composite Classifier Improved Composite Classifier TYPE II: Select mutants that most improve the classifier. Additive Curiosity Additive Curiosity Additive Bayesian Surprise Additive Bayesian Surprise TYPE III: Common methods taken from the literature. Minimum Marginal Hyperplane Minimum Marginal Hyperplane Maximum Entropy Maximum Entropy TYPE IV: Variations on methods from the literature. Maximum Marginal Hyperplane Maximum Marginal Hyperplane Minimum Entropy Minimum Entropy Entropic Tradeoff Entropic Tradeoff TYPE C: Controls Non-iterated Prediction Non-iterated Prediction Predict All Inactive Predict All Inactive Random (30 trials) Random (30 trials)

10 Outline Overview: Computer Guided Discovery Overview: Computer Guided Discovery Problem: Cancer and p53 Problem: Cancer and p53 Results: Best Active Learning Results: Best Active Learning Next: Future Experiments Next: Future Experiments

11 The Problem: p53 and Cancer p53 mutations occur in ~50% of human cancers Tumor Suppressor Protein. Tumor Suppressor Protein. Receives upstream signals indicating cellular stress. Receives upstream signals indicating cellular stress. Acts as a transcription factor in the cancer suppression pathway. Acts as a transcription factor in the cancer suppression pathway. p53 core domain bound to DNA Image Generated with UCSF Chimera Cho, Y.Cho, Y., Gorina, S., Jeffrey, P.D., Pavletich, N.P. Crystal structure of a p53 tumor suppressor-DNA complex: understanding tumorigenic mutations. Science v265 pp , 1994Gorina, S.Jeffrey, P.D.Pavletich, N.P.

12 The p53 Cancer Pathway David W. Meek:

13 NC Core domain for DNA bindingTetramerization Transactivation 1-42 The Concept of Cancer Rescue: Second-site Suppressor Mutations Cancer mutation prevalence data from the IARC p53 database:

14 Ultimate Goal Inactive p53 Cancer Mutant Engineered Small Molecule Drug += Functionally Active Rescued p53 Advance medical practice by revealing p53 mutant functional properties across p53s mutation sequence space. Intermediate Goal Find novel p53 Cancer Rescue Mutants. Immediate Goal

15 Evaluating Cancer Rescue Mutants in the Wet Lab inactive p53 cancer will not grow A Yeast containing an inactive p53 cancer mutant will not grow. active p53 cancer rescue will grow A Yeast containing an active p53 cancer rescue mutant will grow. INACTIVE ACTIVE Baroni, T.E., Wang, T., Qian, H., Dearth, L.R., Truong, L.N., Zeng, J., Denes, A.E., Chen, S.W. and Brachmann, R.K. (2004) A global suppressor motif for p53 cancer mutants. Proc Natl Acad Sci U S A, 101,

16 In Vitro Phenotype

17 In a Nutshell Cancer Rescue Mutants Use Active Learning to select the p53 mutants that will be the most informative. Test the predictions in-vitro. Build classifiers of putative p53 cancer rescue mutants. Experiment Model Find all p53 cancer rescue mutants Knowledge

18 Outline Overview: Computer Guided Discovery Overview: Computer Guided Discovery Problem: Cancer and p53 Problem: Cancer and p53 Results: Best Active Learning Results: Best Active Learning Next: Future Experiments Next: Future Experiments

19 The Active Learning Tradeoff: How Fast Does It Learn?

20 The Active Learning Tradeoff: How Accurate On The Chosen? 204 Predicts 57 TypeMethodAccuracy Correlation Coefficient Student-T I Maximum Curiosity 77.19% +/- 5.61% % I Composite Classifier 70.18% +/- 6.11% % I Improved Composite Classifier 71.93% +/- 6.00% % II Additive Curiosity 73.68% +/- 5.88% % II Additive Bayesian Surprise 73.68% +/- 5.88% % III Minimum Marginal Hyperplane 64.91% +/- 6.38% % III Maximum Entropy 64.91% +/- 6.38% % IV Maximum Marginal Hyperplane 78.95% +/- 5.45% % IV Minimum Entropy 77.19% +/- 5.61% % IV Entropic Tradeoff % +/- 5.27% % C Non-iterated Prediction 56.14% +/- 6.63% % C Predict All Inactive 80.70% +/- 5.27% % C Random (30 trials) 74.39% +/- 3.87% / % +/- 2.89%

21 The Tradeoff How Fast Does It Learn? How Accurate on the Chosen? Sum? Length + Width Geometric Distance? Area? Length * Width Solution: Average Score of All Three Metrics Maximum Curiosity Entropic Tradeoff Minimum Marginal Hyperplane

22 The Overall Best RankMethod Average Score 1 Maximum Curiosity Entropic Tradeoff Random (30 trials) Minimum Entropy Maximum Marginal Hyperplane Maximum Entropy Additive Bayesian Surprise Minimum Marginal Hyperplane Additive Curiosity 1.89

23 How Fast Does It Learn? The Three Previous Examples

24 How Accurate On The Chosen? The Three Previous Examples 204 Predicts 57 TypeMethodAccuracy Correlation Coefficient Student-T I Maximum Curiosity 77.19% +/- 5.61% % III Minimum Marginal Hyperplane 64.91% +/- 6.38% % IV Entropic Tradeoff % +/- 5.27% % C Non-iterated Prediction 56.14% +/- 6.63% % C Predict All Inactive 80.70% +/- 5.27% % C Random (30 trials) 74.39% +/- 3.87% / % +/- 2.89%

25 Why Does Random Do So Well? Tong, S. and D. Koller (2002). "Support vector machine active learning with applications to text classification." The Journal of Machine Learning Research 2: Very Few Examples

26 Outline Overview: Computer Guided Discovery Overview: Computer Guided Discovery Problem: Cancer and p53 Problem: Cancer and p53 Results: Best Active Learning Results: Best Active Learning Next: Future Experiments Next: Future Experiments

27 Exploring New p53 Regions Each new p53 region potentially introduces new rescue mechanisms. Each new p53 region potentially introduces new rescue mechanisms. New pools of mutants restart the Active Learning problem. New pools of mutants restart the Active Learning problem p53 Core Domain N C

28 Most Interesting or Most Interesting Active? Which Finds More Active Cancer Rescue Mutants? Iteration 1 Iteration 2 Iteration 3 Select The Most Interesting Select The Most Interesting Active Iteration 1 Iteration 2 Iteration 3 Known Mutants

29 ConclusionTheory Find Cancer Rescue Mutants Knowledge Experiment

30 Pierre Baldi Jonathan Chen Hiroto Saigo S. Joshua Swamidass Baldi Lab Rainer Brachmann Jue Zeng Brachmann Lab Richard Lathrop Gabe Moothart Lathrop Lab Ying Wang Leuke Lab Ray Luo Qiang Lu Luo Lab Acknowledgments Funding National Institute of Health ( p53: CA ), UCI Office of Research and Graduate Studies, UCI Institute for Genomics and Bioinformatics ( BIT: LM ), US Department of Energy (DOE)

31 Questions?Theory Find Cancer Rescue Mutants Knowledge Experiment

32 Most Interesting Region Scan the p53 core domain to find the most interesting region. Scan the p53 core domain to find the most interesting region.

33 Create All Single Point Mutations in a Region in-vitro? CODA*: Assemble p53 using thermodynamically optimized oligonucleotides. Allow all possible mutations within a region. Assemble mutated region with cancer mutants to look for rescue mutants. *http://www.codagenomics.com/

34 Knowledge Representation: Homology Modeling Modeling done using Amber with zinc ion characteristics tuned by Dr. Qiang Lu working in Dr. Ray Luis lab. 1. Take a wild type crystal structure of the protein in question. 2. Substitute one or more amino acids to mutate the protein. 3. Apply simulated physical laws to determine an energy function. 4. Minimize the energy of the new mutant protein.

35 Knowledge Representation: Features Simulated Structure -> String of Numbers 1d: Sequence Mutation Features 1d: Sequence Mutation Features s1d: Sequence Similarity Features s1d: Sequence Similarity Features 2d: Surface Map Features 2d: Surface Map Features 3d: Atomic Position Features 3d: Atomic Position Features 4d: Time Dependant Stability Information 4d: Time Dependant Stability Information

36 What is Machine Learning? Training: Set the parameters (W) with n features. Testing: Use the parameters (W) to predict unclassified examples W1W1W1W1 W2W2W2W2 … WnWnWnWn F 11 F 12 … F 1n F 21 ……… ………… F m1 …… F mn Example 1 Example 2 … Example m Class 1 Class 2 … Class m Unknown F 11 F 12 … F 1n W1W1W1W1 W2W2W2W2 … WnWnWnWn Prediction

37 Modeling: How To Use It Biology Computer Generated Structure Make a protein and test it in-vitro PRO: Real CON: Slow Predict a protein structure in-silico PRO: Fast CON: Inaccurate, what does it tell us? Machine Learning Use Homology Modeling to guide biological research

38 Maximum Curiosity Choose a mutant from the test set that has not been considered yet. Assume the chosen is Active or Inactive training chosen Crossvalidate the training set with the chosen mutant and record the correlation coefficient. training test Start with a training set of examples with known classes and an unclassed test set. Model Find the Mutants that Most Improve the Training Set Knowledge Experiment

39 Exploring New p53 Regions Each new p53 region potentially introduces new rescue mechanisms. Each new p53 region potentially introduces new rescue mechanisms. New pools of mutants restart the Active Learning problem. New pools of mutants restart the Active Learning problem p53 Core Domain

40 Primary Collaborators Dr. Rainer Brachmann School of Medicine Dr. Richard Lathrop School of Information and Computer Science Jue Zeng School of Medicine


Download ppt "Choosing where to look next in a mutation sequence space: Choosing where to look next in a mutation sequence space: Active Learning of informative p53."

Similar presentations


Ads by Google