Presentation on theme: "Introduction to translational and clinical bioinformatics Connecting complex molecular information to clinically relevant decisions using molecular."— Presentation transcript:
1Introduction to translational and clinical bioinformatics Connecting complex molecular information to clinically relevant decisions using molecular profilesConstantin F. Aliferis M.D., Ph.D., FACMIDirector, NYU Center for Health Informatics and BioinformaticsInformatics Director, NYU Clinical and Translational Science InstituteDirector, Molecular Signatures Laboratory,Associate Professor, Department of Pathology,Adjunct Associate Professor in Biostatistics and Biomedical Informatics, Vanderbilt UniversityAlexander Statnikov Ph.D.Director, Computational Causal Discovery laboratoryAssistant Professor, NYU Center for Health Informatics and Bioinformatics, General Internal Medicine
2GoalsUnderstand spectrum of Bioinformatics and Medical informatics activitiesUnderstand basic concepts of clinical/translational BioinformaticsUnderstand basic concepts of molecular profilingIntroduction to high-throughput assays enabling molecular profilingIntroduction to computational data analytics/bioinformatics enabling molecular profilingUnderstand analytic challenges and pitfalls/interpretation issuesDiscuss case study of profiles used to diagnose/treat patientsPerform hands-on development of a molecular profile, finding novel biomarkers and testing profile/markers accuracyDiscussion supported by general literature and heavily grounded on:NYUMC informatics experts/research projects/grants/papers/entities/software systemsCommercially availiable modalities & assays
3Overview Session #1: Basic Concepts Session #2: High-throughput assay technologiesSession #3: Computational data analyticsSession #4: Case study / practical applicationsSession #5: Hands-on computer lab exercise
4Session #1: Basic Concepts Understand spectrum of Bioinformatics and Medical informatics activities - NYUMC informaticsUnderstand basic concepts of clinical/translational BioinformaticsUnderstand basic concepts of molecular profilingALSO:- s/names/interests- adjustments to plan
5NYU Center for health Informatics & Bioinformatics: Broad Plan Infrastructure & Integrative Methods/ActivitiesBioinformaticsEducationalinformaticsCTSIBPIC (best Practices Integrative Consultation Core/ServiceMicroarray Informatics:UpstreamDifferential expression,Pathway inferenceMolecular profilesEvidence based medicine, and Information retrieval InformaticsHigh Performance Computing FacilityLiterature Synthesis & Benchmarking studiesResearch labsKlugerMolecular SignaturesEBM, IR & ScientometricsComputational Causal DiscoveryMethod-problem “matchmaking”Library CollaborativeDesign and execution of studiesNext-gen sequencing informatics :Upstream analysesi. Chi-seqii. RNA seqiii. Epigeneticsiv. Microbiomicsv. micro RNA studiesvi. CNV & splice variation studiesvii. Digital RNAviii. Denovo sequencing & re-sequencingDownstream analysesData Integration & Mining:Data warehouse & interfacing with EMROmics LIMSGenomic EMRBiospecimen managementresearch protocol database systems and management teamData mining serviceData Mining softwareCancerCenterMS/PhD (& Post-doc Fellowship) ProgramGenetics-GenomicsContinuing EducationWorkshops & tutorialsPaper digestResearch ColloquiumInvited SpeakersCOEsMulti-modal & Integrative studiesIntegrate/Focus Existing Informatics and Increase CollaborationsProteomicsInformatics
6Current Capabilities: Areas Health InformaticsInfrastructure & Integrative Methods/ActivitiesBioinformaticsEducationalinformaticsCTSIBPIC (best Practices Integrative Consultation Core/ServiceMicroarray Informatics:UpstreamDifferential expression,Pathway inferenceMolecular profilesEvidence based medicine, and Information retrieval InformaticsHigh Performance Computing FacilityLiterature Synthesis & Benchmarking studiesResearch labsKlugerMolecular SignaturesEBM, IR & ScientometricsComputational Causal DiscoveryMethod-problem “matchmaking”Library CollaborativeDesign and execution of studiesNext-gen sequencing informatics :Upstream analysesi. Chi-seqii. RNA seqiii. Epigeneticsiv. Microbiomicsv. micro RNA studiesvi. CNV & splice variation studiesvii. Digital RNAviii. Denovo sequencing & re-sequencingDownstream analysesData Integration & Mining:Data warehouse & interfacing with EMROmics LIMSGenomic EMRBiospecimen managementresearch protocol database systems and management teamData mining serviceData Mining softwareCancerCenterMS/PhD (& Post-doc Fellowship) ProgramGenetics-GenomicsContinuing EducationWorkshops & tutorialsPaper digestResearch ColloquiumInvited SpeakersCOEsMulti-modal & Integrative studiesIntegrate/Focus Existing Informatics and Increase CollaborationsProteomicsInformatics
7Current & Future capabilities Health InformaticsEducationalinformaticsContent management, medical simulationsEvidence based medicine, and Information retrieval InformaticsFilter Medline according to content and qualityFilter Web for health advice qualityPredict future citations of articlesClassify individual citations as instrumental or notIdentify special types of articlesConstruct citation histories & Analyze impact of articlesIntegrate and manage queries and related contentCombine and optimize knowledge source searchesNew “find a researcher”“Find a collaborator”Library CollaborativeApply, evaluate, refine next-gen IR methodsData Integration & Mining:Data warehouse & interfacing with EMROmics LIMSGenomic EMRBiospecimen managementresearch protocol database systems and management teamData mining serviceData Mining softwareData warehouse needs; software acquisition; implementationOMICS LIMS needs capture; vendor product assessment; funds; sofwtare purchase and implementation; integration with billing and EMRBiospecimen managementResearch protocol database system (eVelos)Data base management teamData mining serviceData mining engine: faculty; funds; prototype; implementation; evaluation
8Current & Future capabilities Infrastructure & Integrative Methods/ActivitiesCTSI(supported by rest of objectives)High Performance Computing FacilitySequencing server;hectar1; hectar2;Funds; needs; grants; personnel post; specs; room/networking/access;Personnel hires; hw install; licenses; BP; launchResearch labsKlugerMolecular SignaturesEBM, IR & ScientometricsComputational Causal DiscoveryKluger TF /Regulation studies; high-throughput outcome prediction, specialized clustering methodsMolecular Signatures development of molecular signatures for diagnosis outcome prediction and personalized medicine, discovery of diagnostic/imaging biomarkers and putative drug targets , deployment of signatures, automated software, new methodsEBM, IR & Scientometrics development and evaluation of next-gen IR and scientometric models and studiesComputational Causal Discovery discovery of pathways; studies of causal validity of bioinformatics discovery methods, multiplicity studies, automated software, active learning/experiment number minimizationMS/PhD (& Post-doc Fellowship) ProgramFormal Training in Biomedical Informatics at pre and post-doctoral levelsContinuing EducationWorkshops & tutorialsPaper digestResearch ColloquiumInvited SpeakersContinuing EducationWorkshops & tutorialsPaper digestResearch ColloquiumInvited SpeakersIntegrate/Focus Existing Informatics and Increase CollaborationsFaculty and Staff career development; Informatics Affiliates; Working Collaborations with Courant, Polytechnic, NYC Informatics and other non-NYUMC entities
9Current & Future capabilities BioinformaticsBPIC (best Practices Integrative Consultation Core/ServiceLiterature Synthesis & Benchmarking studiesMethod-problem “matchmaking”Design and execution of studiesStudy publication assistanceArea-specific(Disease, Assay)InformaticsMicroarray Informatics: Experiment design, assay execution, differential expression, pathway mapping, pathway-specific testing (GSEA/GSA), de novo pathway discovery, phylogeny, clustering, hybrid experimental/observational designs; SNP arrays; ChIP-on-ChIP analyses, aCGH, tiled arrays, etc…Genetics-GenomicsCOEsSequencing Informatics: Chip-Seq analysis, digital gene expression, de novo sequence assembly & reassembly, CNV analysis, epigenomic studies, microbiomicsCancer CenterProteomics Informatics: platform-specific pre-processing, differential abundance, peptide-protein mapping, protein identification, de novo protein interaction network inference, protein modification and structure studies,…Multi-modal Integrative and Higher-level Informatics:Molecular Signatures & linking high-dimensional data to phenotype development of molecular signatures for diagnosis, outcome prediction and personalized medicine; in silico signature scanning, in silico signature equivalence, discovery of diagnostic/imaging biomarkers and putative drug targets , deployment of signatures, automated software, novel methodsMechanistic /causative studies discovery of pathways; multiplicity studies, TS/DBN designs, automated software, active learning/experiment number minimizationIntegrating clinical lab, text, imaging and high throughput data in CTs/prospective studies or exploratory retrospective ones
10Summary Contacts (Until Centralized Consultation Service is Launched) Management of Clinical and protocol data James RobinsonEducational Informatics Mark TriolaNext-Gen Information Retrieval Lawrence Fu, Constantin Aliferis, TBDInformatics for Data Mining Alexander Statnikov, Constantin AliferisData Integration & Warehousing John Chelico, Ross Smith, Constantin AliferisHigh Performance Computing Constantin Aliferis, Ross SmithBest Practices in Bioinformatics Constantin Aliferis, Alexander StatnikovSequencing Informatics Upstream:Stuart Brown, Alexander Alekseyenko, Yuval Kluger, Jinhua Wang, TBD, TBDDownstream:Alexander Alekseyenko, Yuval Kluger, Jinhua Wang, Alexander Statnikov, Constantin AliferisMicroarray Informatics Jiri Zafadil, Yuval Kluger, Jinhua Wang,Constantin Aliferis, Alexander StatnikovCancer Informatics Yuval Kluger, Jinhua Wang, Stuart Brown,Jiri Zafadil, Constantin AliferisProteomics Informatics Stuart Brown, Jinhua Wang, Constantin Aliferis,Alexander Statnikov, TBDGeneral Tools Stuart BrownSpecialized applications(Genetics, Regulation, Pathways…) Stuart Brown, Yuval Kluger, Alexander Statnikov, Constantin AliferisMolecular Signatures development,biomarker discovery,Multi-modal andIntegrative studies Constantin Aliferis, Alexander Statnikov,Yuval Kluger
11Molecular Signatures Definition = computational or mathematical models that link high-dimensional molecular information to phenotype of interest
12Molecular SignaturesGene markersNew drug targets
13Molecular Signatures: Main Uses Direct benefits: Models of disease phenotype/clinical outcome & estimation of the model performanceDiagnosisPrognosis, long-term disease managementPersonalized treatment (drug selection, titration) (“predictive” models)Ancillary benefits 1: Biomarkers for diagnosis, or outcome predictionMake the above tasks resource efficient, and easy to use in clinical practiceHelps next-generation molecular imagingLeads for potential new drug candidatesAncillary benefits 2: Discovery of structure & mechanisms (regulatory/interaction networks, pathways, sub-types)
14Molecular SignaturesThe FDA calls them “in vitro diagnostic multivariate index assays”1. “Class II Special Controls Guidance Document: Gene Expression Profiling Test System for Breast Cancer Prognosis”:addresses device classification2. “The Critical Path to New Medical Products”:- identifies pharmacogenomics as crucial to advancing medical product development and personalized medicine.3. “Draft Guidance on Pharmacogenetic Tests and GeneticTests for Heritable Markers” & “Guidance for Industry: Pharmacogenomic Data Submissions”identifies 3 main goals (dose, ADEs, responders),define IVDMIA,encourages “fault-free” sharing of pharmacogenomic data,separates “probable” from “valid” biomarkers,focuses on genomics (and not other omics),
15Less Conventional Uses of Molecular Signatures Increased Clinical Trial sample efficiency, and decreased costs or both, using placebo responder signatures ;In silico signature-based candidate drug screening;Drug “resurrection”Establishing existence of biological signal in very small sample situations where univariate signals are too weak;Assess importance of markers and of mechanisms involving thoseChoosing the right animal model…?
16Recent molecular mignatures available for patient care AgendiaClarientPrediction SciencesLabCorpOvaSureUniversity GenomicsGenomic HealthVeridexBioTheranosticsApplied GenomicsPower3Correlogic Systems
17Molecular signatures in the market (examples) CompanyProductDiseasePurposeAgendiaMammaPrintBreast cancerRisk assessment for the recurrence of distant metastasis in a breast cancer patient.TargetPrintQuantitative determination of the expression level of estrogen receptor, progesteron receptor and HER2 genes. This product is supplemental to MammaPrint.CupPrintCancerDetermination of the origin of the primary tumor.University GenomicsBreast BioclassifierClassification of ER-positive and ER-negative breast cancers into expression-based subtypes that more accurately predict patient outcome.ClarientInsight Dx Breast Cancer ProfilePrediction of disease recurrence risk.Prostate Gene Expression ProfileProstate cancerDiagnosis of grade 3 or higher prostate cancer.Prediction SciencesRapidResponse c-Fn TestStrokeIdentification of the patients that are safe to receive tPA and those at high risk for HT, to help guide the physician’s treatment decision.Genomic HealthOncotypeDxIndividualized prediction of chemotherapy benefit and 10-year distant recurrence to inform adjuvant treatment decisions in certain women with early-stage breast cancer.bioTheranosticsCancerTYPE IDClassification of 39 types of cancer.Breast Cancer IndexRisk assessment and identification of patients likely to benefit from endocrine therapy, and whose tumors are likely to be sensitive or resistant to chemotherapy.Applied GenomicsMammaStratBreast canderRisk assessment of cancer recurrence.PulmoTypeLung cancerClassification of non-small cell lung cancer into adenocarcinoma versus squamous cell carcinoma subtypes.PulmoStratAssessment of an individual's risk of lung cancer recurrence following surgery for helping with adjuvant therapy decisions.CorrelogicOvaCheckOvarian cancerEarly detection of epithelial ovarian cancer.LabCorpOvaSureAssessment of the presence of early stage ovarian cancer in high-risk women.VeridexGeneSearch BLN AssayDetermination of whether breast cancer has spread to the lymph nodes.Power3BC-SeraProDifferentiation between breast cancer patients and control subjects.
18MammaPrint• Developed by Agendia (www.agendia.com) • 70-gene signature to stratify women with breast cancer that hasn’t spread into “low risk” and “high risk” for recurrence of the disease • Independently validated in >1,000 patients • So far performed 12,000 tests • Cost of the test is $3,200 • In February, 2007 the FDA cleared the MammaPrint test for marketing in the U.S. for node negative women under 61 years of age with tumors of less than 5 cm. • TIME Magazine’s 2007 “medical invention of the year”.
19CupPrint Developed by Agendia (www.agendia.com) ~500-gene (~1900 probes) signature to identify primary site of 49 different types of carcinomas as well as other types of cancer such as sarcoma and melanoma.Several independent validation studies
20ColoPrint In development & validation by Agendia (www.agendia.com) Multi-gene expression signature to determine the risk for recurrence in colorectal cancer patientsPlanning to seek FDA approvalReferences:
21Oncotype DXDevelopment synopsis Main reference: Paik S, Shak S, Tang G, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004; 351(27): • Developed by Genomic Health (www.genomichealth.com ) • 21-gene signature to predict whether a woman with localized, ER+ breast cancer is at risk of relapse • Independently validated in >1,000 patients • So far performed 55,000 tests • Cost of the test is $3,650 • Reimbursement. Information about reimbursement for molecular signatures from Aetna: • Oncotype DX did not undergo FDA review. Here is an article that mentions FDA review of Oncotype DX (slightly outdated): • The following paper shows the health benefits and cost-effectiveness benefits of using Oncotype DX:
22CancerType ID Developed by AviaraDX (www.aviaradx.com) 92-gene signature to classify 39 tumor typesSignature developed by GA/KNN“Compressed version” of CupPrint
23Breast Cancer Index Developed by AviaraDX (www.aviaradx.com) Uses 7 genes (combines 5-gene MGI signature and 2-gene H/I signature)Stratifies breast cancer patients into groups with low or high risk of cancer recurrence and good or poor response to endocrine therapy.Validated in thousands of patients (treated & untreated)
24GeneSearch Breast Lymph Node (BLN) Assay Developed by Veridex (www.veridex.com), a Johnson & Johnson companyTest to detect if breast cancer has spread to the lymph nodesThe GeneSearch BLN uses real-time reverse transcriptase-polymerase chain reaction (RT-PCR) to detect ammoglobin (MG) and cytokeratin 19 (CK 19) in lymph nodes.FDA approvedFeatured in TIME’s 2007 Top 10 Medical Breakthroughs list
25MammoStratDeveloped by Applied Genomics (http://www.applied-genomics.com)The test is based on 5 biomarkers.The test is used to classify individual patients as having an AGI-defined high-, moderate-, or low-risk of breast cancer recurrence following surgical removal of their primary tumor and treatment with tamoxifen alone.Independently validated in >1000 patients
26NuroPro Developed by Power3 (http://www.power3medical.com/) Early detection of neurodegenerative diseases: Alzheimer’s disease, ALS (Lou Gehrig’s disease), and Parkinson’s disease.Validation study in progress.Based on 59 proteins.
27BC-SeraPro Developed by Power3 (http://www.power3medical.com/) Test for diagnosis of breast cancer (breast cancer case vs. control).Validation study in progress.Based on 22 proteins.Uses linear discriminant analysis; outputs a probability score.
28Key ingredients for developing a molecular signature Well-definedclinical problem &access to patientsComputational &BiostatisticalAnalysisMolecular SignatureHigh-throughputassays
29Challenges in Computational Analysis of omics data for development of molecular signatures Relatively easy to develop a predictive model + even easier to believe that a model is good when it is not false sense of securitySeveral problems exist: some theoretical and some practicalOmics data has many special characteristics and is tricky to analyze!
30OvaCheck Developed by Correlogic (www.correlogic.com) Blood test for the early detection of epithelial ovarian cancer Failed to obtain FDA approvalLooks for subtle changes in patterns among the tens of thousands of proteins, protein fragments and metabolites in the bloodSignature developed by genetic algorithmSignificant artifacts in data collection & analysis questioned validity of the signature:Results are not reproducibleData collected differently for different groups of patients
31Problem with OvaCheck A B C D E F Figure from Baggerly et al (Bioinformatics, 2004)DEF
32Molecular SignaturesGene markersNew drug targets
33Brief History of main “omics” technology: gene expression microarrays 1988: Edwin Southern files UK patent applications for in situ synthesized, oligo-nucleotide microarrays1991: Stephen Fodor and colleagues publish photolithographic array fabrication method1992: Undeterred by NIH naysayers, Patrick Brown develops spotted arrays1993: Affymax begets Affymetrix1995: Mark Schena publishes first use of microarrays for gene expression analysisEdwin Southern founds Oxford Gene Technologies1996: First human gene expression microarray study publishedAffymetrix releases its first catalog GeneChip microarray, for HIV, in April1997: Stanford researchers publish the first whole-genome microarray study, of yeast
34Brief History of main “omics” technology: gene expression microarrays (The scientist 2005) 1998: Brown's lab develops CLUSTER, a statistical tool for microarray data analysis; red and green "thermal plots" start popping up everywhere1999: Todd Golub and colleagues use microarrays to classify cancers, sparking widespread interest in clinical applications2000: Affymetrix spins off Perlegen, to sequence multiple human genomes and identify genetic variation using arrays2001: The Microarray Gene Expression Data Society develops MIAME standard for the collection and reporting of microarray data2003: Joseph DeRisi uses a microarray to identify the SARS virusAffymetrix, Applied Biosystems, and Agilent Technologies individually array human genome on a single chip2004: Roche releases Amplichip CYP450, the first FDA-approved microarray for diagnostic purposes
35An early kind of analysis: learning disease sub-types by clustering patient profiles Rb
36Clustering: seeking ‘natural’ groupings & hoping that they will be useful… Rb
37E.g., for treatment Respond to treatment Tx1 p53 Do not Rb
38E.g., for diagnosisAdenocarcinomap53Squamous carcinomaRb
39Another use of clustering Cluster genes (instead of patients):Genes that cluster together may belong to the same pathwaysGenes that cluster apart may be unrelated
40Unfortunately clustering is a non-specific method and falls into the ‘one-solution fits all’ trap when used for predictionDo notRespond to treatment Tx2p53RbRespond to treatment Tx2
41Clustering is also non-specific when used to discover pathway membership, regulatory control, or other causation-oriented relationshipsIt is entirely possible in this simple illustrative counter-example for G3 (a causally unrelated gene to the phenotype) to be more strongly associated and thus cluster with the phenotype (or its surrogate genes) more strongly than the true oncogenic genes G1, G2G1G2PhG3
42Two improved classes of methods Supervised learning predictive signatures and markersRegulatory network reverse engineering pathways
43Supervised learning : use the known phenotypes (a. k Supervised learning : use the known phenotypes (a.k.a “labels) in training data to build signatures or find markers highly specific for that phenotype
45Supervised learning: a geometrical interpretation
46In 2-D looks good but what happens in: 10,000-50,000 (regular gene expression microarrays, aCGH, and early SNP arrays)>500,000 (tiled microarrays, new SNP arrays)10, ,000 (regular MS proteomics)>10, 000, 000 (LC-MS proteomics)This is the ‘curse of dimensionality problem’
47High-dimensionality (especially with small samples) causes: Some methods do not run at all (classical regression)Some methods give bad results (KNN, Decision trees)Very slow analysisVery expensive/cumbersome clinical applicationTends to “overfit”
48Two (very real and very unpleasant) problems: Over-fitting & Under-fitting Over-fitting ( a model to your data)= building a model that is good in original data but fails to generalize well to fresh dataUnder-fitting ( a model to your data)= building a model that is poor in both original data and fresh data
49Intuitive explanation of overfitting & underfitting Play the game: find rule to predict who are the instructors in any given class (use today’s class to find a general rule)
50Over/under-fitting are directly related to the complexity of the decision surface and how well the training data is fitOutcome of Interest YThis line is good!This line overfits!Training DataFuture DataPredictor X
51Over/under-fitting are directly related to the complexity of the decision surface and how well the training data is fitOutcome of Interest YThis line is good!This line underfits!Training DataFuture DataPredictor X
52Very Important Concept: Successful data analysis methods balance training data fit with complexity.Too complex signature (to fit training data well) overfitting (i.e., signature does not generalize)Too simplistic signature (to avoid overfitting) underfitting (will generalize but the fit to both the training and future data will be low and predictive performance small).
53Part of the Solution: feature selection BACDETHIJKQLMNPO
55Datasets Bhattacharjee2 - Lung cancer vs normals [GE/DX] Bhattacharjee2_I - Lung cancer vs normals on common genes between Bhattacharjee2 and Beer [GE/DX]Bhattacharjee Adenocarcinoma vs Squamous [GE/DX]Bhattacharjee3_I - Adenocarcinoma vs Squamous on common genes between Bhattacharjee3 and Su [GE/DX]Savage Mediastinal large B-cell lymphoma vs diffuse large B-cell lymphoma [GE/DX]Rosenwald year lymphoma survival [GE/CO]Rosenwald year lymphoma survival [GE/CO]Rosenwald year lymphoma survival [GE/CO]Adam Prostate cancer vs benign prostate hyperplasia and normals [MS/DX]Yeoh Classification between 6 types of leukemia [GE/DX-MC]Conrads Ovarian cancer vs normals [MS/DX]Beer_I Lung cancer vs normals (common genes with Bhattacharjee2) [GE/DX]Su_I Adenocarcinoma vs squamous (common genes with Bhattacharjee3) [GE/DXBanez Prostate cancer vs normals [MS/DX]
56Methods: Gene and Peak Selection Algorithms ALL No feature selectionLARS LARSHITON_PC -HITON_PC_W - HITON_PC+ wrapping phaseHITON_MB -HITON_MB_W - HITON_MB + wrapping phaseGA_KNN GA/KNNRFE RFE with validation of feature subset with optimized polynomial kernelRFE_Guyon RFE with validation of feature subset with linear kernel (as in Guyon)RFE_POLY RFE (with polynomial kernel) with validation of feature subset with polynomial optimized kernelRFE_POLY_Guyon RFE (with polynomial kernel) with validation of feature subset with linear kernel (as in Guyon)SIMCA SIMCA (Soft Independent Modeling of Class Analogy): PCA based methodSIMCA_SVM SIMCA (Soft Independent Modeling of Class Analogy): PCA based method with validation of feature subset by SVMWFCCM_CCR Weighted Flexible Compound Covariate Method (WFCCM) applied as in Clinical Cancer Research paper by Yamagata (analysis of microarray data)WFCCM_Lancet Weighted Flexible Compound Covariate Method (WFCCM) applied as in Lancet paper by Yanagisawa (analysis of mass-spectrometry data)UAF_KW Univariate with Kruskal-Walis statisticUAF_BW Univariate with ratio of genes between groups to within group sum of squaresUAF_S2N Univariate with signal-to-noise statistic
57Classification Performance (average over all tasks/datasets)
59Number of Selected Features (average over all tasks/datasets)
60Number of Selected Features (zoom on most powerful methods)
61Number of Selected Features (average over all tasks/datasets)
62Conclusions so farSpecial classifiers (with inherent complexity control) combined with feature selection & careful parameterization protocols overcome over-fitting & estimate future performance accurately.Caveats: analysis is typically complex and error prone. Need: (a) an experienced analyst on the team, or (b) a validated software system designed for non-experts.
64Causal Explorer Matlab library of computational causal discovery and variable selection algorithmsIntroductory-level library to our causal algorithms(~3% of our algorithms)Discover the direct causal or probabilistic relations around a response variable of interest (e.g., disease is directly caused by and directly causes a set of variables/observed quantities).Discover the set of all direct causal or probabilistic relations among the variables.Discover the Markov blanket of a response variable of interest, i.e., the minimal subset of variables that contains all necessary information to optimally predict the response variable.Code emphasizes efficiency, scalability, and quality of discoveryRequires relatively deep understanding of underlying theory and how the algorithms operate
65Statistics of Registered Users 739 registered users in >50 countries.402 (54%) users are affiliated with educational, governmental, and non-profit organizations337 (46%) users are either from private or commercial sectors.Major commercial organizations that have registered users of Causal Explorer include:IBMIntelSAS InstituteTexas InstrumentsSiemensGlaxoSmithKlineMerckMicrosoft
66Statistics of Registered Users Major U.S. institutions that have registered users of Causal Explorer:Boston UniversityBrandies UniversityCarnegie Mellon UniversityCase Western Reserve UniversityCentral Washington UniversityCollege of William and MaryCornell UniversityDuke UniversityHarvard UniversityIllinois Institute of TechnologyIndiana University-Purdue University IndianapolisJohns Hopkins UniversityLouisiana State UniversityM. D. Anderson Cancer CenterMassachusetts Institute of TechnologyMedical College of WisconsinMichigan State UniversityNaval Postgraduate SchoolNew York UniversityNortheastern UniversityNorthwestern UniversityOregon State UniversityPennsylvania State UniversityPrinceton UniversityRutgers UniversityStanford UniversityState University of New YorkTufts UniversityUniversity of ArkansasUniversity of California BerkleyUniversity of California Los AngelesUniversity of California San DiegoUniversity of California Santa CruzUniversity of CincinnatiUniversity of Colorado DenverUniversity of DelawareUniversity of Houston-Clear LakeUniversity of IdahoUniversity of Illinois at ChicagoUniversity of Illinois at Urbana-ChampaignUniversity of KansasUniversity of Maryland Baltimore CountyUniversity of Massachusetts AmherstUniversity of MichiganUniversity of New MexicoUniversity of PennsylvaniaUniversity of PittsburghUniversity of RochesterUniversity of Tennessee ChattanoogaUniversity of Texas at AustinUniversity of UtahUniversity of VirginiaUniversity of WashingtonUniversity of Wisconsin-MadisonUniversity of Wisconsin-MilwaukeeVanderbilt UniversityVirginia TechYale University
67Other systems for supervised analysis of microarray data There exist many good software packages for supervised analysis of microarray data, but…Neither system provides a protocol for data analysis that precludes overfitting.A typical software either offers an overabundance of algorithms or algorithms with unknown performance.The software packages address needs only of experienced analysts.To point #2: Thus is it not clear to the user how to choose an optimal algorithm for a given data analysis task.To point #3: However, there is a need to use this software (and still achieve good results) by users who know little about data analysis (e.g., biologists and clinicians).We built a system that addresses all above problems.
68Purpose of GEMS GEMS (model generation & performance estimation mode) Cross-validationperformanceestimateClassificationmodelReduced set of genesLinks to literatureGene expression data andoutcome variableGEMSOptional: Gene names & IDs(model generation & performance estimation mode)
69Purpose of GEMS GEMS (model application mode) Performance Gene expression data andunknown outcome variablePerformanceestimateModel predictionsGEMSClassificationmodel(model application mode)
71Software Architecture of GEMS Wizard-Like User InterfaceGenerate a classificationmodelEstimate classificationperformanceApply existing modelto a new set of patientsGenerate a classification modeland estimate its performanceNormalizationS2N One-Versus-RestS2N One-Versus-OneNon-param. ANOVABW ratioGene SelectionOne-Versus-RestOne-Versus-OneDAGSVMMethod by WWClassification byMC-SVMMethod by CSCross-Validation Loopfor Performance Est.N-Fold CVLOOCVReport GeneratorXfor Model SelectionIIIAccuracyRCIAUC ROCPerformanceComputationHITON_PCHITON_MBComputational Engine
73GEMS 2.0: Wizard-Like Interface Input microarraygene expression datasetFile withgene namesFile with geneaccession numbersOutput model
74Statistics of registered users 800 users in >50 countries350 academic & non-profit users450 private & commercial users205 scientific citations of major paper that introduced GEMSMajor commercial organizations that have registered users of Causal Explorer include:Eli Lilly − NovartisIBM − GEGenedata − Nuvera BiosciencesGenomicTree − CogeneticsPronota
75FAST-AIMSFAST-AIMS is a system to support automatic development of high-quality classification models and biomarker discovery in mass spectrometry proteomics dataIncorporates automated data analysis protocols of GEMSDeals with additional challenges of MS data analysis
77Evaluation in multiple user study FAST-AIMS:Expert: 0.811Higher than PSA: [Thompson JAMA 2005]
78Projects 1 Project Goal & Data types/ design Stage Funding Other involved entitiesDevelopment of placebo responder signatures for Irritable Bowel SyndromeRe-analyze banked samples from clinical trials to create signature of placebo responders; selected proteomic markers and clinical data from humansIn progressNIHHarvard, NYULung cancer signatures and AKT1 pathway in lung cancerFind signatures and markers for lung cancer; focus on local pathway around AKT1; human biopsy and cell line array gene expression dataNYU, Vanderbilt
79Projects 2Molecular signature and biomarkers for atherosclerosis progression and regressionFind signatures, causative, markers, predictive markers and imaging markers; in aortic transplantation and reversa mouse models; microarray and shotgun proteomic dataTransplantation model signature and markers complete; reversa model in developmentIndustry,NIH (requested)NYU, MerckPredicting death from community acquired pneumoniaFind models that predict dire outcomes in patients with CAP; clinical and lab/imaging dataCompletedNSFUniversity of Pittsburgh, Carnegie Melon UniversitySignature for treatment response in colorectal cancerDevelop signature for treatment response in colorectal cancer patients; clinical and gene expression dataIn developmentNIHNYUProstate cancer risk and treatment signaturesDevelop signature for disease risk and optimal management of prostate cancer patients; clinical and GWAS dataDonor funds,DoD (requested)Pathways, markers and signatures for pneumonia development in HIV+ subjectsDiscover pathways/markers and develop signature for pneumonia risk; clinical and next-gen sequencing microbiomic data
80Projects 3Predicting physician judgment in diagnosis of melanomas& guideline compliancePredict physicians’ diagnoses and compare to best practice guidelines for compliance; clinical data and automated imaging dataCompletedEUUniversity of Trento,VanderbiltPredicting risk for sepsis in neonatal intensive unitDiscover optimal treatment strategies; clinical and array gene expression dataIn developmentNIH (requested)Vanderbilt,NYUProteomic based diagnosis of stroke and stroke-like syndromesDevelop signatures for disease diagnosis; selected proteomic dataIndustryOutcome prediction models for ARDSFind signatures and markers for ARDS detection and progression; human clinical data from 3 big clinical trialsNIHMolecular signatures for treatment response in keloidsDiscover pathways/markers and develop signature for treatment response; clinical and microarray data
81Publications - new methods development 1 Novel local causal network/pathway and biomarker discovery algorithms software and protocols“Algorithms for Large Scale Markov Blanket Discovery". I. Tsamardinos, C.F. Aliferis, A. Statnikov. In Proceedings of the 16th International Florida Artificial Intelligence Research Society (FLAIRS) Conference, St. Augustine, Florida, USA; AAAI Press, pages , May 12-14, 2003."Time and Sample Efficient Discovery of Markov Blankets and Direct Causal Relations". I. Tsamardinos, C.F. Aliferis, A. Statnikov. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA; ACM Press, pages , August 24-27, 2003."HITON, A Novel Markov Blanket Algorithm for Optimal Variable Selection”. C. F. Aliferis, I. Tsamardinos, A. Statnikov. In Proceedings of the 2003 American Medical Informatics Association (AMIA) Annual Symposium, pages 21-25, 2003.“Identifying Markov Blankets with Decision Tree Induction.” L. Frey, D. Fisher, I. Tsamardinos, C.F. Aliferis, A. Statnikov. In Proceedings of the Third IEEE International Conference on Data Mining (ICDM), Melbourne, Florida, USA, IEEE Computer Society Press; pages 59-66, November 19-22, 2003."Gene Expression Model Selector (GEMS): a system for decision support and discovery from array gene expression data". A. Statnikov, I. Tsamardinos, Y. Dosbayev, C.F. Aliferis. Int J Med Inform., Aug;74(7-8): , 2005.“Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data”. Aliferis CF, Statnikov A, Tsamardinos I, Schildcrout JS, Shepherd BE, Harrell FE. PLoS ONE 2009, 4: e4922."Formative Evaluation of a Prototype System for Automated Analysis of Mass Spectrometry Data". N. Fananapazir, M. Li, D. Spentzos, C.F. Aliferis. Proc AMIA Symposium, 2005.“Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification. Part I: Algorithms and Empirical Evaluation” Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos (to appear in Journal of Machine Learning Research).“Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification. Part II: Analysis and Extensions” Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos (to appear in Journal of Machine Learning Research).“Design and Analysis of the Causation and Prediction Challenge” I. Guyon, C.F. Aliferis, G.F. Cooper, A. Elisseeff, JP. Pellet, P. Spirtes, A. Statnikov (to appear in Journal of Machine Learning Research).“GEMS: A System for Cancer Diagnosis and Biomarker Discovery from Microarray Gene Expression Data”. Statnikov A, Tsamardinos I, Aliferis CF. AMIA Annual Symposium, 2005.“Using the GEMS System for Cancer Diagnosis and Biomarker Discovery from Microarray Gene Expression Data”. Statnikov A, Tsamardinos I, Aliferis CF. Twelfth National Conference on Artificial Intelligence (AAAI), 2005.“Using GEMS for Cancer Diagnosis and Biomarker Discovery from Microarray Gene Expression Data”. Statnikov A, Tsamardinos I, Aliferis CF. Thirteenth Annual International Conference on Intelligent Systems for Molecular Biology (ISMB), 2005.
82Publications -new methods development 2 Network reverse engineering/global causal discovery “A Novel Algorithm for Scalable and Accurate Bayesian Network Learning”. L.E. Brown, I. Tsamardinos, C.F. Aliferis. In Proceedings of the 11th World Congress on Medical Informatics (MEDINFO), San Francisco, California, USA; September 7-11, 2004.“A Comparison of Novel and State-of-the-Art Polynomial Bayesian Network Learning Algorithms” Laura E. Brown, Ioannis Tsamardinos, C. F. Aliferis. Proc AAAI Conference, 2005.“The Max-Min Hill Climbing Bayesian Network Structure Learning Algorithm”. I. Tsamardinos, L.E. Brown, C.F. Aliferis. Machine Learning, 65:31-78, 2006.“A Causal Modeling Framework for Generating Clinical Practice Guidelines from Data”. S. Mani and C. Aliferis. ”AI in medicine Europe (AIME) Conference, Amsterdam, July 2007.“Learning Causal and Predictive Clinical Practice Guidelines from Data”. S. Mani, C. F. Aliferis, S. Krishnaswami, T. Kotchen. In International Medical Informatics Congress, MEDINFO, 2007.“Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification. Part I: Algorithms and Empirical Evaluation” Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos (to appear in Journal of Machine Learning Research).
83Publications - new methods development 3 Theoretical properties of discovery methods "Towards Principled Feature Selection: Relevance, Filters, and Wrappers". I. Tsamardinos and C.F. Aliferis. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, Florida, USA, January 3-6, 2003."Why Classification Models Using Array Gene Expression Data Perform So Well: A Preliminary Investigation Of Explanatory Factors". C.F. Aliferis, I. Tsamardinos, P. Massion, A. Statnikov, D. Hardin. In Proceedings of the 2003 International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences (METMBS), Las Vegas, Nevada, USA; CSREA Press, June 23-26, 2003."A Theoretical Characterization of Linear SVM-Based Feature Selection". D. Hardin, I. Tsamardinos, C.F. Aliferis. In Twenty-First International Conference on Machine Learning (ICML), 2004.“Are Random Forests Better than Support Vector Machines for Microarray-Based Cancer Classification?” A. Statnikov, C.F. Aliferis. Proc AMIA Fall Symposium 2007.“Using SVM Weight-Based Methods to Identify Causally Relevant and Non-Causally Relevant Variables”. Statnikov A., Hardin D., Aliferis CF. Workshop on: Feature Selection and Causality, NIPS, 2006.“Challenges in the Analysis of Mass-Throughput Data: A Technical Commentary from the Statistical Machine Learning Perspective”. Aliferis CF, Statnikov A, Tsamardinos I. Cancer Informatics, 2: 133–162, 2006.“ Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification. Part II: Analysis and Extensions” Constantin F. Aliferis, Alexander Statnikov, Ioannis Tsamardinos, Subramani Mani, and Xenofon D. Koutsoukos (to appear in Journal of Machine Learning Research).“The Problem of Statistical Gene Instability in Microarray Studies: External Reproducibility and Biological Importance of Unstable Genes and their Molecular Signatures” C.F. Aliferis, A. Statnikov, S. Pratap, E. Kokkotou. (In preparation).“Application and Comparative Evaluation of Causal and Non-Causal Feature Selection Algorithms for Biomarker Discovery in High-Throughput Biomedical Datasets”. Aliferis CF, Statnikov A., Tsamardinos I, Kokkotou E, Massion PP. Workshop on Feature Selection and Causality, NIPS 2006.“Pathway induction and high-fidelity simulation for molecular signature and biomarker discovery in lung cancer using microarray gene expression data”. Aliferis CF, Statnikov A, Massion P. In Proc 2006 American Physiological Society Conference: Physiological Genomics and Proteomics of Lung Disease. November 2-5, 2006.
84Publications - new methods development 4 Other methodological studies with relevance for predictive modeling“Temporal Representation Design Principles: An Assessment in the Domain of Liver Transplantation”. C.F. Aliferis, and G. F. Cooper. Proc AMIA Symp., 170-4, 1998.“Machine Learning Models For Lung Cancer Classification Using Array Comparative Genomic Hybridization”. C. F. Aliferis, D. Hardin, P.Massion. Proc AMIA Symp., 7-11, 2002."Machine Learning Models For Classification Of Lung Cancer and Selection of Genomic Markers Using Array Gene Expression Data". C.F. Aliferis, I. Tsamardinos, P. Massion, A. Statnikov, N. Fananapazir, D. Hardin. In Proceedings of the 16th International Florida Artificial Intelligence Research Society (FLAIRS) Conference, St. Augustine, Florida, USA; AAAI Press, pages 67-71, May 12-14, 2003."Text Categorization Models for Retrieval of High Quality Articles in Internal Medicine." Y. Aphinyanaphongs, C.F. Aliferis. In Proceedings of the 2003 American Medical Informatics Association (AMIA) Annual Symposium, Washington, DC, USA; pages 31-35, 2003.“Text Categorization Models for Retrieval of High Quality Articles in Internal Medicine”. Y. Aphinyanaphongs, I. Tsamardinos, A. Statnikov, D. Hardin, C.F. Aliferis. J Am Med Inform Assoc., Mar-Apr;12(2):207-16, 2005.“A Semantic Model for Organizing Molecular Medicine "Omics" Modalities and Evidence” Firas Wehbe, Pierre Massion, Cindy Gadd, Daniel Masys and C.F. Aliferis. (to appear in Cancer Informatics).
85Benchmarking Studies 1 Evaluation of classifier algorithms Statnikov A, Aliferis CF, Tsamardinos I, Hardin D, Levy S: A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics 2005, 21:Statnikov A, Aliferis CF, Tsamardinos I: Methods for multi-category cancer diagnosis from gene expression data: a comprehensive evaluation to inform decision support system development. Medinfo , 11:Statnikov A, Wang L, Aliferis CF: A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification. BMC Bioinformatics 2008, 9: 319.Evaluation of biomarker/variable selection algorithmsAliferis CF, Tsamardinos I, Statnikov A: HITON: a novel Markov blanket algorithm for optimal variable selection. AMIA 2003 Annual Symposium Proceedings 2003,Aliferis CF, Statnikov A, Massion PP: Pathway induction and high-fidelity simulation for molecular signature and biomarker discovery in lung cancer using microarray gene expression data. Proceedings of the American Physiological Society Conference "Physiological Genomics and Proteomics of Lung Disease" 2006.Aliferis CF, Statnikov A, Tsamardinos I, Kokkotou E, Massion PP: Application and comparative evaluation of causal and non-causal feature selection algorithms for biomarker discovery in high-throughput biomedical datasets. Proceedings of the NIPS 2006 Workshop on Causality and Feature Selection 2006.Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD: Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification. Part II: Analysis and Extensions. Journal of Machine Learning Research 2009.Aliferis CF, Statnikov A, Tsamardinos I, Mani S, Koutsoukos XD: Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification. Part I: Algorithms and Empirical Evaluation. Journal of Machine Learning Research 2009.
86Benchmarking Studies 2Comparison of algorithms for extraction of all maximally predictive and non-redundant molecular signaturesStatnikov A: Algorithms for Discovery of Multiple Markov Boundaries: Application to the Molecular Signature Multiplicity Problem. Ph D Thesis, Department of Biomedical Informatics, Vanderbilt University 2008.Comparison of protocols to detect predictive signal of prognostic molecular signaturesAliferis CF, Statnikov A, Tsamardinos I, Schildcrout JS, Shepherd BE, Harrell FE: Factors Influencing the Statistical Power of Complex Data Analysis Protocols for Molecular Signature Development from Microarray Data. PLoS ONE 2009, 4: e4922.
87PatentsAliferis CF, Tsamardinos I. Method, system, and apparatus for casual discovery and variable selection for classification. U.S. patent (US 7,117,185 B1).Aliferis CF. Local Causal and Markov Blanket Induction Methods for Causal Discovery and Feature Selection from Data. U.S. provisional patent application ( ).Statnikov A, Aliferis CF. Methods for Discovery of Markov Boundaries from Datasets with Hidden Variables. U.S. provisional patent application ( ).Statnikov A, Aliferis CF. A Method for Determining All Markov Boundaries and Its Application for Discovering Multiple Optimally Predictive and Non-Redundant Molecular Signatures. U.S. provisional patent application.Statnikov A, Aliferis CF, Tsamardinos I, Fananapazir N. Method and System for Automated Supervised Data Analysis. U.S. patent application ( ).
88Grants 1NIH/NLM 2, R56 LM A1, Aliferis, PI, Causal Discovery Algorithms for Translational Research with High -Throughput Data, 07/15/2008 – 07/14/2009 , $333,863.00NSF, NSF , Guyon, PI, Causal Discovery Workbench and Challenge Program , 09/01/ /30/2009NIH/NCRR, 1 U54 RR A2, Cronstein, PI, Institutional Clinical and Translational Science Award , 07/01/2009 – 06/30/2014, $32,411,416.00NIH/NCCAM, 1 R01 AT A1, Kokkotou, PI, Omics and Variable Responses to Placebo and Acupuncture in Irritable Bowel Syndrome , 07/01/2009 – 06/30/2014 , $376,663.00NIH/NHLBI, 1 R01 HL A1, Schuening, PI , Genomics and Genetics of Acute Graft-Versus Host Disease, 07/01/2009 – 06/30/2014, $2,489,743.00NSF, Guyon, PI, Resource Allocation Strategies in Causal ModelingDOD, PC093319P1, Aliferis, PI, Enhanced prediction of prostate cancer risk and progression and causative gene identification Award Mechanism: Synergistic Idea Development Award, 07/01/2010 – 06/30/2013, $NIH, 1 S10 RR , Smith, PI, High Performance Computing Equipment to Support Biomedical Research at NYU, 12/02/2009 – 11/30/20010, $650,433.00NIH, 1R01OD , Fisher, PI, Key Factors for the Regression and Imaging of Atherosclerosis, 09/01/ /31/2014NIH, RO1 AR , Cronstein, PI, The Pharmacology of Dermal Fibrosis, /01/ /30/2011, $295,242.00NIH, Blumenberg, PI, Skinomics , 09/01/ /31/2011, 582,596.00NIH, 1U01HL , Weiden, PI, Bacterial, Fungal and Viral Microbiome in the Lung, 9/30/2009 – 9/29/2014NIH, RFA-OD , Aliferis, PI, Recovery Act Limited Competition: Supporting New Faculty Recruitment to Enhance Research Resources through Biomedical Research Core Centers (P30), 09/30/2009 – 06/20/2010NIH, Jiyoung, PI, Integrative Analysis of Genome-wide Gene Expression for Prostate Cancer PrognosisNIH, Cllelland, PI, First Episode Psychiatric Illness: A Clinical, BioData and Biomaterials Resource
89Grants 2NIH/NHLBI 1 U01 HL (Ware), “Biomarker Profiles in the Diagnosis/Prognosis of ARDS”, 08/12/2005 – 06/30/2009 Total Award: $6,059,257.NIH, NCI 1 U24 CA (Liebler), “Clinical Proteomic Technology Assessment for Cancer”, 09/28/2006 – 08/31/2011, Total Award: $7,388,990.NSF (Guyon), “Causal Discovery Workbench and Challenge Program”, 08/15/2007 – 07/31/2009, Total Award: $107,721.NIH, NLM 2 T15 LM (Gadd) “Vanderbilt Biomedical Informatics Training Program”, 07/01/2007 – 06/30/2012, Total Award: $3,969,225.NIH/NLM, 1 R01 LM “Principled methods for very-large-scale causal discovery.” 07/01/2003 – 06/30/ Total Award: $631,180.NIH/NLM BISTI Planning Grant, 1 P20 LM (Stead) “Pilot Project Computational Models of Lung Cancer: Connecting Classification, Gene Selection, and Molecular Sub-typing”, 09/01/2002 – 08/30/2004, Total Award: $226,500.NIH/NLM 1 T15 LM (Miller), “Biomedical Informatics Training Grant.” 07/01/2002 – 06/30/2007, Total Award: $3,966,644.BMS training contract, Fall-Spring 2002, Total Award: $150,000.“Vanderbilt Academic Venture Capital Fund support for the Discovery Systems Laboratory”, 07/01/2003 – 06/30/2006, Total Award: $846,3471.“Vanderbilt University Discovery Grant to study Complex Modeling of Clinical Trial Data with Gene Expression Covariates & Development of Optimal Re-analysis Policies” 07/01/2001 – 06/30/2002, Total Award: $50,000.“Causal Discovery Challenge” from PASCAL (Pattern Analysis, Statistical Modeling and Computational Learning). PASCAL is the European Commission's IST-funded Network of Excellence for Multimodal Interfaces, 03/01/2006 – 11/30/2007, Total Award: 18,000 Euros.NIH/NCI 1 P50 CA (Coffey) “SPORE in GI Cancer” 09/24/2002 – 04/30/2007, Total Award: $11,851,282.NIH/NHLBI 1 U01 HL A1 (Roden) “Pharmacogenics of Arrhythmia Therapy”, 04/01/2001 – 03/31/2005, Total Award: $11,189,918.NIH/NCI 1 P50 CA (Arteaga) “SPORE in Breast Cancer” 08/01/2003 – 05/31/2008, Total Award: $12,804,130.
90Main points, session #1Molecular signatures are an important tool for research both basic and translational; they are finding their way to clinical practiceData analytics of molecular signatures are very importantWe introduced the importance of bioinformatics for the analysis of high dimensional data and the creation of molecular signatures
91For session #2Review slides in today’s presentation and bring written questions (if any) to discuss in subsequent sessionsRead materials regarding assay technologies (to be distributed electronically). Note:Basic principle underlying each technologyAdvantages over older technologiesLimitations & technical difficultiesHow it may support your research interests now or in the future