Microarrays & Gene Expression Analysis

Slides:



Advertisements
Similar presentations
Analysis of Microarray Genomic Data of Breast Cancer Patients Hui Liu, MS candidate Department of statistics Prof. Eric Suess, faculty mentor Department.
Advertisements

Basic Gene Expression Data Analysis--Clustering
DNA Microarray Jamie Mashek.
Modeling sequence dependence of microarray probe signals Li Zhang Department of Biostatistics and Applied Mathematics MD Anderson Cancer Center.
1 MicroArray -- Data Analysis Cecilia Hansen & Dirk Repsilber Bioinformatics - 10p, October 2001.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Gene Expression Chapter 9.
Introduction to DNA Microarrays Todd Lowe BME 88a March 11, 2003.
Gene expression analysis summary Where are we now?
Microarrays Dr Peter Smooker,
DNA Microarray: A Recombinant DNA Method. Basic Steps to Microarray: Obtain cells with genes that are needed for analysis. Isolate the mRNA using extraction.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Figure 1: (A) A microarray may contain thousands of ‘spots’. Each spot contains many copies of the same DNA sequence that uniquely represents a gene from.
The Human Genome Project and ~ 100 other genome projects:
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Bacterial Physiology (Micr430)
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
Arrays: Narrower terms include bead arrays, bead based arrays, bioarrays, bioelectronic arrays, cDNA arrays, cell arrays, DNA arrays, gene arrays, gene.
Inferring the nature of the gene network connectivity Dynamic modeling of gene expression data Neal S. Holter, Amos Maritan, Marek Cieplak, Nina V. Fedoroff,
Microarrays: Theory and Application By Rich Jenkins MS Student of Zoo4670/5670 Year 2004.
Introduce to Microarray
Why microarrays in a bioinformatics class? Design of chips Quantitation of signals Integration of the data Extraction of groups of genes with linked expression.
Genomics I: The Transcriptome RNA Expression Analysis Determining genomewide RNA expression levels.
Gene Expression Analysis using Microarrays Anne R. Haake, Ph.D.
Analysis of High-throughput Gene Expression Profiling
Analysis of microarray data
with an emphasis on DNA microarrays
From motif search to gene expression analysis
Introduction to DNA Microarray Technology Steen Knudsen Uma Chandran.
The Genome is Organized in Chromatin. Nucleosome Breathing, Opening, and Gaping.
Data Type 1: Microarrays
Microarray Technology
Finish up array applications Move on to proteomics Protein microarrays.
Scenario 6 Distinguishing different types of leukemia to target treatment.
CS491JH: Data Mining in Bioinformatics Introduction to Microarray Technology Technology Background Data Processing Procedure Characteristics of Data Data.
Microarrays and Gene Expression Analysis. 2 Gene Expression Data Microarray experiments Applications Data analysis Gene Expression Databases.
1 FINAL PROJECT- Key dates –last day to decided on a project * 11-10/1- Presenting a proposed project in small groups A very short presentation (Max.
Genomics I: The Transcriptome
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
1 Global expression analysis Monday 10/1: Intro* 1 page Project Overview Due Intro to R lab Wednesday 10/3: Stats & FDR - * read the paper! Monday 10/8:
Gene Expression Analysis. 2 DNA Microarray First introduced in 1987 A microarray is a tool for analyzing gene expression in genomic scale. The microarray.
Gene Expression and Networks. 2 Microarray Analysis Supervised Methods -Analysis of variance -Discriminate analysis -Support Vector Machine (SVM) Unsupervised.
Lecture 7. Functional Genomics: Gene Expression Profiling using
Whole Genome Approaches to Cancer 1. What other tumor is a given rare tumor most like? 2. Is tumor X likely to respond to drug Y?
Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be more direct, but is currently.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Microarray Technology. Introduction Introduction –Microarrays are extremely powerful ways to analyze gene expression. –Using a microarray, it is possible.
Microarray (Gene Expression) DNA microarrays is a technology that can be used to measure changes in expression levels or to detect SNiPs Microarrays differ.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Overview of Microarray. 2/71 Gene Expression Gene expression Production of mRNA is very much a reflection of the activity level of gene In the past, looking.
High-throughput omic datasets and clustering
ANALYSIS OF GENE EXPRESSION DATA. Gene expression data is a high-throughput data type (like DNA and protein sequences) that requires bioinformatic pattern.
Molecular Classification of Cancer Class Discovery and Class Prediction by Gene Expression Monitoring.
Microarrays and Other High-Throughput Methods BMI/CS 576 Colin Dewey Fall 2010.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
DNA Microarray Overview and Application. Table of Contents Section One : Introduction Section Two : Microarray Technique Section Three : Types of DNA.
Introduction to Oligonucleotide Microarray Technology
Network Analysis Goal: to turn a list of genes/proteins/metabolites into a network to capture insights about the biological system 1.Types of high-throughput.
Microarray: An Introduction
Genomics A Systematic Study of the Locations, Functions and Interactions of Many Genes at Once.
FINAL PROJECT- Key dates
Gene Expression Analysis
Microarray Technology and Applications
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Loyola Marymount University
Presentation transcript:

Microarrays & Gene Expression Analysis

Contents DNA microarray technique Why measure gene expression Clustering algorithms Relation to Cancer SAGE SBH – Sequencing By Hybridization

DNA Microarrays Developed around 1987. Employ methods previously exploited in immunoassay context – specific binding and marking techniques. Two types of probes: http://www.gene-chips.com/ Format I: probe cDNA (500~5,000 bases long) is immobilized to a solid surface such as glass; widely considered as developed at Stanford University; Traditionally called DNA microarrays. Format II: an array of oligonucleotide (20~80-mer oligos) probes is synthesized either in situ (on-chip) or by conventional synthesis followed by on-chip immobilization; developed at Affymetrix, Inc. Many companies are manufacturing oligonucleotide based chips using alternative in-situ synthesis or depositioning technologies. Historically called DNA chips.

DNA Microarray Technique The microarray is made of a small piece of glass (1x1 or 2x2 cm). Thousands to millions of pixels are put on it, in each many (n) copies of DNA probes (short (8-30 bases), single stranded, called OLIGO). A probe on the array will bind its complementary target if it is present in the solution washing the chip. When the array surface is scanned with a laser, fluorescent labels attached to the targets reveal which probes are bound.

Use of DNA Microarrays Identify a query sequence - the sequence is hybridized to an array containing suitable probes Point mutations (SNP) or other mutations – the array contains probes that match segments of the normal and mutated sequences. An unknown sequence (SBH) – the array contains all possible k-mers (e.g., all the 46 6-mers) Gene expression analysis - which genes are expressed ? under what conditions ?

DNA Microarray Methodology - Flash Animation http://www.bio.davidson.edu/biology/courses/genomics/chip/chip.html

Why Measure Gene Expression

Why Measure Gene Expression Determines which genes are induced/repressed in response to a developmental phase or to an environmental change.

Why Measure Gene Expression Determines which genes are induced/repressed in response to a developmental phase or to an environmental change. Sets of genes whose expression rises and falls under the same condition are likely to have a related function.

Why Measure Gene Expression Determines which genes are induced/repressed in response to a developmental phase or to an environmental change. Sets of genes whose expression rises and falls under the same condition are likely to have a related function. Features such as a common regulatory motif can be detected within co-expressed genes.

Why Measure Gene Expression Determines which genes are induced/repressed in response to a developmental phase or to an environmental change. Sets of genes whose expression rises and falls under the same condition are likely to have a related function. Features such as a common regulatory motif can be detected within co-expressed genes. A pattern of gene expression may be used as an indicator of abnormal cellular regulation. A useful tool for cancer diagnosis

Clustering Co-expressed Genes Find genes whose expression rises and falls under the same conditions. Methods include: Hierarchical clustering. Self organizing maps. Support vector machines (SVMs).

Hierarchical Clustering Cluster analysis and display of genome-wide expression patterns. Michael B. Eisen, Paul T. Spellman, Patrick O. Brown , and David Botstein, 1998, http://www.pnas.org/cgi/content/full/95/25/14863 Relationships among objects (genes) are represented by a tree whose branch lengths reflect the degree of similarity between the objects, as assessed by a pairwise similarity function. The computed trees can be used to order genes in the original data table, so that genes or groups of genes with similar expression patterns are adjacent.

GeneExplorer GeneCards pointer UniGene pointer Zoom:

Similarity Metric The gene similarity metric is a form of correlation coefficient. Let Gi equal the (log-transformed) primary data for gene G in condition i. For any two genes x and y observed over a series of N conditions, a similarity score can be computed as follows: S(x,y) = i=1..N(xi-x)(yi-y) / (std(x)*(std(y)) where x,y are the mean of observations on genes x and y. A neighbor joining method is used to built the corresponding tree.

Tree Creation For any set of n genes, a similarity matrix is computed by using the metric described above. The matrix is scanned to identify the highest value (representing the most similar pair of genes). A node is created joining these two genes, and a gene expression profile is computed for the node by averaging observation for the joined elements (missing values are omitted and the two joined elements are weighted by the number of genes they contain). The similarity matrix is updated with this new node replacing the two joined elements, and the process is repeated n-1 times until only a single element remains.

Five separate clusters are indicated by colored bars and by identical coloring of the corresponding region of the dendrogram. The sequence-verified named genes in these clusters contain multiple genes involved in (A) cholesterol biosynthesis, (B) the cell cycle, (C) the immediate-early response, (D) signaling and angiogenesis, and (E) wound healing and tissue remodeling. These clusters also contain named genes not involved in these processes and numerous uncharacterized genes.

Self Organizing Maps K-means method: the number of clusters is fixed (k). g1, ..,gn represents the expression of each gene gi in d experiments as a point in d dimensions. Randomly choose k centers, c1, ..,ck: ci is a point in a d dimension. The protocol: Join gi to the closest center. Compute new centers. The new center ci‘ is the center of mass of all points joined to ci. Repeat the steps until convergence or until you’re pleased with the results.

Relation to Cancer Tumors result from disruptions of growth regulation. Although most tumors are treated with general anti-proliferate drugs, they exhibit remarkable clinical heterogeneity which remains a major challenge in the successful management of cancer. Clinical heterogeneity in tumors likely reflects unrecognized molecular heterogeneity in tumors. Because of the logical connection between gene expression patterns and phenotype, it is likely that there is a direct connection between gene expression patterns of tumors and their clinical phenotype.

Towards a clinically relevant taxonomy of Cancer Access archived clinical tumor samples taken at or near diagnosis from patients with well-characterized subsequent clinical histories. Use DNA arrays to measure gene expression in these samples. Look for new molecularly defined groups within or between previously recognized groups of tumors, especially groups with increased clinical homogeneity. Look for direct associations between molecular and clinical properties of tumors.

Cancer Gene Expression The suggested procedure has been used to classify several types of cancer, or cancerous verses normal cells. Breast cancer AML and ALL. Melanoma. Lymphoma. …

Example - Melanoma Molecular classification of cutaneous malignant melanoma by gene expression profiling. Nature 2000 Aug 3;406(6795):536-40 Discovered a subset of melanomas identified by mathematical analysis of gene expression in a series of samples.

Example - Melanoma Remarkably, many genes underlying the classification of this subset are differentially regulated in invasive melanomas that form primitive tubular networks in vitro, a feature of some highly aggressive metastatic melanomas. Global transcript analysis can identify unrecognized subtypes of cutaneous melanoma and predict experimentally verifiable phenotypic characteristics that may be of importance to disease progression.

Detection of Regulatory Motifs A group of co-expressed genes is likely to be co-regulated during transcription. Transcription initiation is mediated by regulatory proteins that usually bind upstream to the transcription start site. The regulatory proteins bind to conserved regulatory motifs, a short DNA sequence. The upstream region of co-expressed genes can be searched for a common regulatory motif.

Other Applications – Predictive Tools There is a correlation between co-expression and related gene function. “Inferring subnetworks from perturbed expression profiles.” Bioinformatics. 2001 Jun;17 Suppl 1:S215-S224. There is a correlation between co-expression and protein-protein interaction. “Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae.” Nat Genet. 2001 Dec;29(4):482-6. Poor correlation between gene expression and protein expression.

Correlation between gene and protein expression Ideker et al., science 2001

Design & Probe Selection Sensitivity – probes need to hybridize to their targets. For example – they need to avoid highly structured regions of the target molecule. Specificity – probes need not hybridize to wrong targets (cross hybridization). To this end: design probes to be long enough for statistical protection. search databases to explicitly avoid cross-hybridization to known foreign mRNA. Mismatch control.

Other Challenges Analyze image to infer expression levels from red to green ratios, clean background, check for outliers, etc. Infer causal relations between genes – regulatory networks.

represents the transcription product of a gene. Experimental technique assigned to gain a quantitive measure of gene expression. ~10-20 base “tags” are produced (immediately adjacent to the 3’ end of the 3’ most NlaIII restriction site). The SAGE technique measures not the expression level of a gene, but quantifies a "tag" which represents the transcription product of a gene. http://www.ncbi.nlm.nih.gov/SAGE/

SAGE Technique Extracting unique tagging sequences from mRNA molecules (tags are ~10-20b long). Concatenating the tags to a long sequence. Sequencing the resulting sequence and inferring levels from frequencies. Advantage: an unbiased and inclusive analysis of the transcriptome. Sequencing errors are especially problematic when tags are used, because of the short length of tags. Of roughly 1.5 million transcript sequences stored in GenBank, only about 180,000 are well characterized, and tags could represent them.

http://www.sagenet.org/

Colon cancer vs normal colon http://www.ncbi.nlm.nih.gov/SAGE/index.cgi A Colon cancer B Normal colon Colon cancer vs normal colon

SBH – Sequencing by Hybridization A method for sequencing, actually the original motivation of DNA microarrays. A chip containing all k-mers is produced. The query sequence is hybridized to the chip. Example: a chip of all 3-mers is produced, containing 64 probes. 5 probes will be highlighted. C A T A T A Using chips for sequencing T A G A G T G T A C A T A G T A

SBH Protocol Knowing the start and end of the query sequence, and the set of highlighted k-mers, the query sequence is reconstructed. Example: start = CAT, end = GTA, highlighted group = {CAT, ATA, TAG, AGT, GTA}. CAT – AT? CAT ATA – TA? CATA TAG – AG? CATAG AGT – GTA CATAGT Problems: Reconstruction is not always unique – same k-mer may be followed by several k-mers. CAT – ATA, ATG. Hybridization contain errors.