PRESENTATION - people ISG (Intelligent Systems Group) Researching Group Donostia- San Sebastián Computer Science Faculty - University.

Slides:



Advertisements
Similar presentations
Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Ancha Baranova George Mason University, Fairfax, VA
Tecniche di Intelligenza Artificiale in Bioinformatica Università degli Studi di Ferrara ENDIF – Dipartimento di Ingegneria Giacomo Gamberoni.
Ron Shamir. Education BS – Mathematics Hebrew University PhD – Operations Research Berkley.
Statistical Classification for Gene Analysis based on Micro-array Data Fan Li & Yiming Yang In collaboration with Judith Klein-Seetharaman.
The Golden Age of Biology DNA -> RNA -> Proteins -> Metabolites Genomics Technologies MECHANISMS OF LIFE Health Care Diagnostics Medicines Animal Products.
CSE 574 – Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.
4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
Intelligent Systems Group Emmanuel Fernandez Larry Mazlack Ali Minai (coordinator) Carla Purdy William Wee.
CIBB-WIRN 2004 Perugia, 14 th -17 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini Feature.
Feature Selection and Its Application in Genomic Data Analysis March 9, 2004 Lei Yu Arizona State University.
DIMACS Workshop on Machine Learning Techniques in Bioinformatics 1 Cancer Classification with Data-dependent Kernels Anne Ya Zhang (with Xue-wen.
Artificial Intelligence Term Project #3 Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University
3 rd Summer School in Computational Biology September 10, 2014 Frank Emmert-Streib & Salissou Moutari Computational Biology and Machine Learning Laboratory.
Supervised gene expression data analysis using SVMs and MLPs Giorgio Valentini
Epistasis Analysis Using Microarrays Chris Workman.
San Sebastián Meeting May, ELVIRA II San Sebastián Meeting May, 2004 Andrés Masegosa.
Gene Expression Analysis using Microarrays Anne R. Haake, Ph.D.
Introduction to Data Mining Engineering Group in ACL.
CS Machine Learning. What is Machine Learning? Adapt to / learn from data  To optimize a performance function Can be used to:  Extract knowledge.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
A REVIEW OF FEATURE SELECTION METHODS WITH APPLICATIONS Alan Jović, Karla Brkić, Nikola Bogunović {alan.jovic, karla.brkic,
Whole Genome Expression Analysis
Analysis and Management of Microarray Data Dr G. P. S. Raghava.
Evaluation of Supervised Learning Algorithms on Gene Expression Data CSCI 6505 – Machine Learning Adan Cosgaya Winter 2006 Dalhousie University.
Clustering of DNA Microarray Data Michael Slifker CIS 526.
Genetic Regulatory Network Inference Russell Schwartz Department of Biological Sciences Carnegie Mellon University.
Exagen Diagnostics, Inc., all rights reserved Biomarker Discovery in Genomic Data with Partial Clinical Annotation Cole Harris, Noushin Ghaffari.
University of Washington Institute of Technology Tacoma, WA, USA Ecole des Hautes Etudes en Santé Publique Département Infobiostat Rennes, France Isabelle.
It is only the beginning: Putting microarrays into context Matthias E. Futschik Institute for Theoretical Biology Humboldt-University, Berlin, Germany.
Using Bayesian Networks to Analyze Whole-Genome Expression Data Nir Friedman Iftach Nachman Dana Pe’er Institute of Computer Science, The Hebrew University.
Construction of cancer pathways for personalized medicine | Presented By Date Construction of cancer pathways for personalized medicine Predictive, Preventive.
1 Machine Learning 1.Where does machine learning fit in computer science? 2.What is machine learning? 3.Where can machine learning be applied? 4.Should.
+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)
Analysis and Management of Microarray Data Previous Workshops –Computer Aided Drug Design –Public Domain Resources in Biology –Application of Computer.
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Data Mining and Decision Trees 1.Data Mining and Biological Information 2.Data Mining and Machine Learning Techniques 3.Decision trees and C5 4.Applications.
EB3233 Bioinformatics Introduction to Bioinformatics.
Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Literature Survey: Microarray Data Analysis Ei-Ei Gaw Arizona State University CSE 591 April 24, 2003.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Artificial Intelligence Project #3 : Diagnosis Using Bayesian Networks May 19, 2005.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
1 A combining approach to statistical methods for p >> n problems Shinto Eguchi Workshop on Statistical Genetics, Nov 9, 2004 at ISM.
Data Mining: Knowledge Discovery in Databases Peter van der Putten ALP Group, LIACS Pre-University College Bio Informatics January
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
Brad Windle, Ph.D Unsupervised Learning and Microarrays Web Site: Link to Courses and.
Prof. Yechiam Yemini (YY) Computer Science Department Columbia University (c)Copyrights; Yechiam Yemini; Lecture 2: Introduction to Paradigms 2.3.
Dimensionality Reduction in Unsupervised Learning of Conditional Gaussian Networks Authors: Pegna, J.M., Lozano, J.A., Larragnaga, P., and Inza, I. In.
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
Introduction Background Medical decision support systems based on patient data and expert knowledge A need to analyze the collected data in order to draw.
Center for Bioinformatics and Genomic Systems Engineering Bioinformatics, Computational and Systems Biology Research in Life Science and Agriculture.
David Amar, Tom Hait, and Ron Shamir
Machine Learning with Spark MLlib
Classification with Gene Expression Data
An Artificial Intelligence Approach to Precision Oncology
Eick: Introduction Machine Learning
Gene Expression Classification
Microarray Technology and Applications
Molecular Classification of Cancer
What is Pattern Recognition?
Prepared by: Mahmoud Rafeek Al-Farra
DNA Chip Data Interpretation Tools: Genmapp & Dragon View
Christoph F. Eick: A Gentle Introduction to Machine Learning
Presentation transcript:

PRESENTATION - people ISG (Intelligent Systems Group) Researching Group Donostia- San Sebastián Computer Science Faculty - University of the Basque Country Group leader: Pedro Larrañaga Ph.D.: Jose Lozano, Endika Bengoetxea, Iñaki Inza Ph.D. Students: Rosa Blanco, Jose L. Flores, Cristina González, Aritz Pérez, Ramón Sagarna, Guzmán Santafé Collaborator: Jose M. Peña (Ph.D., Aalborg University), Rubén Armañanzas

RESEARCH TOPICS Machine Learning – Data mining: Learning of Bayesian networks (learning the joint probability) Bayesian networks for (supervised – unsupervised) classification Preprocess tasks: feature subset selection problem, discretization, imputation of missing values... Optimization: Genetic Algorithms Estimation of Distribution Algorithms (EDAs)  Bayesian networks for optimization in NP-hard problems Applications: Medical applications (brain images, cirrhotic patients,breast cancer, skin melanoma, etc.) Bioinformatics: classification in DNA microarrays Software testing

SEVERAL RESEARCH PROJECTS Data mining in bioinformatics Software testing ELVIRA project: Open source code for building-managing Bayesian networks (building, inference, propagation, abduction, classification, explanation...) Written in Java Concurrently programmed by 5 spanish universities

DATA MINING IN BIOINFORMATICS DNA microarrays Genome Human Project (U.C. Santa Cruz)

A DNA microarray sample One of the developments within Genome Project From the tissue  to the scanned image Tissue  microarray chip  DNA  mRNA  hybridization on a microarray  fluorescent image  scanning  reflecting the expression level of thousands of genes at a time

A DNA MICROARRAY COLLECTION Rows  genes; Columns  cases, samples, biopsyes, tissues, ‘cell-lines’...

SEVERAL MICROARRAY DATASETS DATASETGENESA special characteristic about each tissue Colon2,000 Biopsy: ‘tumor’ vs. ‘normal’ Leukemia7,129 Leukemia type: AML, ALL NCI-601,376 9 types of tumor Alizadeh’002,984 2 types of lymphoma: ‘center B-like’, ‘activates B-like’ Chen’0217,400 Hepato celular carcinoma vs. Not liver cancer Garber’01>24,000 Subtypes of lung cancer

PROBLEM GOAL-TASK The usual for biologists: Hierarchical clustering of genes Hierarchical clustering of tissues Focusing on the specific nature of each tissue: Building of a supervised model which accurately predicts the specific nature - characteristic of future and doubtful tissues: cancer vs. normal benignant vs. malignant tumor specific type of cancer,...

Our work: selection of relevant genes in DNA microarray SUPERVISED tasks Small area within bioinformatics. Huge dimensionality (> 1,000)  can not learn the model at first glance  selection of genes, crucial task Application goals: Development of drugs to act over the relevant genes Therapy development Diagnostic purposes Supervised tasks (i.e., benignant – malignant tumor) Literature: Golub et al.’99, Brazma’00, Friedman’00, Xing & Jordan’01... For a specific disease  genes seem relevant

OUR APPROACH TO GENE SELECTION Search algorithms: sequential (forward), EDAs... Wrapper - Filter evaluation functions Classification algorithms: naive-Bayes and Bayesian networks, K-NN, IF-THEN rules... Made-own software and freeware software (ELVIRA,WEKA, MLC++...) Our ‘Talón de Aquiles’ (weak point): Biological interpretation of induced models and selected genes, validity of obtained recognition accuracy...

PUBLICATIONS IN BIOINFORMATICS R. Blanco, P. Larrañaga, I. Inza, B. Sierra (2004). “Gene selection for cancer classification using wrapper approaches”. International Journal of Pattern Recognition and Artificial Intelligence I. Inza, P. Larrañaga, R. Blanco, A. J. Cerrolaza (2003). “Filter versus wrapper gene selection approaches in DNA microarray domains”. Artificial Intelligence in Medicine Journal. Special issue in “Data mining in Genomics and Proteomics” I. Inza, B. Sierra, R. Blanco, P. Larrañaga (2002). “Gene selection by sequential search wrapper approaches in microarray cancer class prediction”. Journal of Intelligent and Fuzzy Systems. Special issue in Bioinformatics

INTERESTING REFERENCES Conferences: ISMB: International Symposium on Molecular Biology ECCB: European Conference on Computational Biology CAMDA: Critical Assesment of Microarray Data Analysis WABI: Workshop on Algorithms in Bioinformatics Reference journal: “Bioinformatics” and special issues of machine learning journals on the topic Web sites: Stanford Genomic Resources  Stanford Microarray Database Hebrew University (N. Friedman, D. Pe’er, I. Nachman...) Tel Aviv University (R. Shamir) Human Genome Working Draft: