Nuria Lopez-Bigas Methods and tools in functional genomics (microarrays) BCO17
What are microarrays?
Microarray data analysis is the step that will allow us to extract biological meaning to high-throughput data generated with the experiment. Microarray data analysis
Microarray DATA Normalized data Data preprocession and normalization
Normalization and Noise: Normalization Some kind of normalization is usually required when comparing more than one microarray experiment. Adjust to account for differences in overall brightness of slides Normalize relative to housekeeping genes Noise Refers to variability and reproducibility of microarray experiments Intra and inter-microarray variations can significantly skew interpretation of data Sample collection is very important. If comparing two conditions you must control for all variables other than the one you are trying to measure Technical noise can result from imperfections in the chip. Both biological and technical replicates are required to measure and control these sources of noise Microarray data analysis
Differential expression Microarray DATANormalized data Data preprocession and normalization Data analysis
Microarray data analysis Differential expressionGO,KEGG…analysis Microarray DATANormalized data Data preprocession and normalization Data analysis
The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism. The Ontologies Cellular component Biological process Molecular function BROWSER::AMIGO TOOLS Gene Ontology
Gene Ontology::Tools FUNC-EXPRESSION
KEGG
Microarray data analysis Differential expression GO,KEGG…analysis Classification Microarray DATANormalized data Data preprocession and normalization Data analysis
Classification Support vectors machines Desition trees
Microarray data analysis Differential expression GO,KEGG…analysis Classification Clustering Microarray DATANormalized data Data preprocession and normalization Data analysis
Supervised versus Unsupervised: Supervised Analysis to determine genes that fit a predetermined pattern Usually used to find genes with expression levels that are significantly different between groups of samples or finding genes that accurately predict a characteristic of the sample Two popular supervised techniques would be nearest-neighbour analysis and support vector machines. Unsupervised Analysis to characterize the components of a data set without a priori input or knowledge of a training signal Try to find internal structure or relationships in data without trying to predict some ‘correct answer’. Three classes: 1. Feature determination: Look for genes with interesting patterns Eg. Principal-components analysis 2. Cluster determination: Determine groups of genes with similar expression patterns eg. Nearest-neighbour clustering, self-organizing maps, k-means clustering, 2d hierarchical clustering 3. Network determination: Determine graphs representing gene-gene or gene-phenotype interactions. Eg. Boolean networks, Bayesian networks, relevance networks Clustering & Classification
Cooper Breast Cancer Res :158
Microarray data analysis Differential expression GO,KEGG…analysis Clustering Classification Promoter analysis Microarray DATANormalized data Data preprocession and normalization Data analysis
Promoter analysis::TFBS TRANSFAC
Promoter analysis::Tools
Microarray data analysis Differential expression GO,KEGG…analysis Clustering Classification Promoter analysis Reverse engineering Microarray DATANormalized data Data preprocession and normalization Data analysis
Reverse engineering
Microarray data analysis Differential expression GO,KEGG…analysis Clustering Classification Promoter analysis Reverse engineering Microarray DATANormalized data Data preprocession and normalization Data analysis