Presentation on theme: "Bio- and Medical- Informatics Presenter: Russell Greiner."— Presentation transcript:
Bio- and Medical- Informatics Presenter: Russell Greiner
Vision Statement 2 Helping the world understand … and make informed decisions. * Potential beneficiaries: biological and medical researchers, practicing clinicians, and the people they serve. bio- and medical- informatics * data
3 Motivation High impact on bio-science and society Local bioinformatics expertise ML has a key role: actual patterns (predictors, … ) not known lots of data Challenging ML problems data is high dimensional, noisy, … often structured data need to obtain training data, labels, … …
4 Personnel PI synergy: R. Greiner, R. Goebel, C. Szepesvari 18 Software developers 4 Postdocs (3 AICML) 14 UGrad / IIP students 17 Grad students (11 MSc, 6 PhD)
5 Partners/Collaborators 6 UofA CS profs 5 UofA Bioscientists Non-UofA collaborators: Cross Cancer Institute (Alberta Cancer Board) University of Alberta Hospital Boston University, Maimi University, Dept of Homeland Security
6 Additional Resources Grants $440K PENCE (Proteome Analyst) $600K ACB (Brain Tumour) Part of $3.6M GenomeCanada (Human Metabolome Project) $5.5M GenomeCanada (Alberta Transplant Institute) $1.7M ACB (misc PolyomX grants) In Kind: Data from CCI, ATI 1970+ MRI scans (260 patients); 270 labeled 300 (30K – 50K) Microarray chips 80 (250K) SNP Chips
7 Highlights The Human Metabolome is ~completed and annotated described in Science, Nature, … Human Metabolome DataBase used by 78,673 Visitors (438,481 pageviews) Proteome Analyst is world’s best predictor of subcell location analyzed >1,000,000 proteins, for >1,000 users Patent filed for Brain Tumor Software Effective new approach for learning to classify Microarrays Virus classifier obtained 98.5% accuracy!
8 30,000 SNP AnalysisMicroarray Proteomics Metabolomics
11 How to Treat Brain Tumours? Irradiate ONLY visible tumor No! Must also kill “ (radiographically) occult ” cancer cells surrounding tumour ! Irradiate everything within 2 cm margin around tumor But that … also includes normal cells still misses other occult cells Standard Practice!
12 How to Treat Brain Tumours? BETTER: Predict (from earlier data) location of occult cells Just irradiate that region! Minimize number of normal cells zapped to minimize loss of brain function Meaningful, as conformal radiotherapy can zap arbitrary shapes!
13 How to Predict? Occult cells region where tumour cell will grow next (Assumption) use prior data (260 patients) Observe each patient over time – how tumours have grown Predict patterns, based on properties of tumour, patient, region, …
Technology … Using Discriminative Random Field Segmentation Growth Prediction Extensions: Increase Accuracy: Support Vector Random Field Increase Computational Efficiency: Decoupled SVRF Exploit Unlabeled Region: Semi-Supervised (D)SVRF
15 Brain Tumour: Future Work Incorporate other modalities Diffusion Tensor Imaging PET … Compute other features: Textures (BGLAM) Using alignment Improve learning algorithms Use Active Learning techniques to determine which regions/slices/studies/patients to label using which human labeler
19 HMP Overview Goal: identity & quantify the entire human “ metabolome ” all small endogamous and exogenous chemicals that appear in a non-trivial quantity in people … 30,000 Genes 3200 Enzymes 2300 Chemicals Metabolomics Proteomics Genomics ``HMDB: The Human Metabolome Database'‘, Nucleic Acids Research, January 2007.
20 HMP #1: Fast Profiling Given an NMR spectrum (blood, urine, CSF), autonomously find & quantify >100 compounds, in < 2 minutes If know “ NMR signature ” of each metabolite … then linear least squares Except … “ signature ” not stable – shifts with unobservable ions Think EM … ML challenge Acquire “ conditional NMR signature ” Active Learning
21 Cachexia? Classifier Cachexia = Yes! Collect patient urine Obtain NMR spectrum Classify Profile Compute Metabolic Profile Gluco mse Hippur ate Histidi ne Isoleuc ine Isopro panol LactateLactose … Leucine 414.2599.32.7310.4416.0140.8390.3 … 5.6 HMP #2: Classify Patients Given: Metabolic profile of patient NMR/Mass spec of patient ’ s urine, blood, CSF Predict: Patient ’ s disease state Reaction to Rx; Cachexia; Cancer The role of ML … Learn Profile Dx classifier
22 HMP #3: Chemical Property Given: Specific metabolite (chemical) Predict: Chemical properties of metabolite Solubility, Melting point, … Biological properties of metabolite which reactions consume it, … The role of ML … Learn Metabolite Property classifier
24 PolyomX Given: Description of a patient (SNP, Microarray, Metabolomic Profile, … ) Predict: Dx: Breast Cancer, Ovarian Cancer, … Rx: Prostate Cancer Toxicity, Cachexia, … The role of ML … Learn Patient Dx classifier, … ``Predictive Models for Breast Cancer Susceptibility from Multiple, Single Nucleotide Polymorphisms'', Clinical Cancer Research, April 2004. ``Association of DNA Repair and Steroid Metabolism Gene Polymorphisms with Clinical Late Toxicity in Patients Treated with Conformal Radiotherapy for Prostate Cancer'', Clinical Cancer Research, April 2006.
PolyomX: Future Work Better tools for analyzing microarrays Rank-One Bicluster Classifier (RoBiC) Scaling up to 250K SNP chips Incorporating >1 modality Many other tasks: Ovarian Cancer (microarray) Use pathways to understand microarray Microtubules docking …
27 Proteome Analysis Given: Protein (FASTA format) Predict: Properties of Protein General function Subcellular localization The role of ML … Learn Protein Location classifier
28 Results so far Proteome Analyst classifiers General Function: 80 – 90% SubCellular Location: ~90% Best known, by any system! (BioInformatics, 2004) “ Explain ” facility has already helped users to identify problems in dataset … ``The Path-A metabolic pathway prediction web server'', Nucleic Acids Research, July 2006. ``PA-GOSUB: A Searchable Database of Model Organism Protein Sequences With Their Predicted GO Molecular Function and Subcellular Localization'', Nucleic Acids Research, Dec 2005. ``Proteome Analyst: Custom Predictions with Explanations in a Web-based Tool for High- Throughput Proteome Annotations'', Nucleic Acids Research, July 2004 ``Proteome Analyst: Custom Predictions with Explanations in a Web-based Tool for High-Throughput Proteome Annotations'', Nucleic Acids Research, July 2004 ``Visual Explanation and Auditing of Evidence with Additive Classifiers'‘, IAAI06, July 2006
29 Current Proteome Analyst Tasks Analyze metabolic pathways Incorporate hierarchy (GO) Use other information Motifs in protein, … Other applications Relate to Microarray data Use GLOBAL properties of complete- proteome … phylogenetic hierarchy …
Whole Genome Analysis heuristic selection of whole genome substrings, to increase efficiency and accuracy of subtype identification in HIV genome construct Complete Composition Vector (CCV) nucelotide presentation, as approximate signature of viral genome 100% recognition of subtypes in 867 whole genome examples
32 Other Bioinformatics Tasks Predict Bull ’ s Expected Breeding Value from SNPs Bovine Haplotype Predict Tumour Rejection from Microarray Other challenges from colleagues at Univ Hospital, Cross Cancer Inst. …