Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microarrays and Gene Expression

Similar presentations

Presentation on theme: "Microarrays and Gene Expression"— Presentation transcript:

1 Microarrays and Gene Expression
DTC Bioinformatics Course 9th February 2010 Helen Lockstone

2 Overview Background Array design Applications of array technology
Steps in data analysis Finding differentially expressed genes Biological interpretation

3 Schedule Time Topic Introduction to microarray technology and applications Break Microarray data analysis Practical 1 Lunch Biological interpretation Practical 2

4 Microarrays in the Literature

5 The Central Dogma Transcriptome measured by microarrays

6 Premise of Microarrays
Compare gene expression between groups Differentially expressed genes may provide some biological insight But not magical solutions!

7 Typical Microarray Designs
Disease vs control Good prognosis vs poor prognosis Different tumour types Effect of treatment Effect of stimulus Time course Different tissues/stages of development

8 Criticism of Microarrays
Non-hypothesis driven “fishing expeditions” Because microarray experiments are expensive and time-consuming to interpret, often published as a stand-alone experiment Produce large amounts of data, interpretations can be very different (but equally valid) Further experimental work, following up hypotheses suggested from array data, can produce elegant studies Perception that data is unreliable – validation

9 Microarray Repositories
GEO – ArrayExpress - Excellent resource of microarray data MIAME guidelines

10 What is a Microarray? Glass slide consisting of hundreds of thousands of probes arranged in grid layout Each probe detects a particular RNA species (transcript) Hybridisation occurs by complementary base-pairing Make quantitative measurements – signal from each probe is proportional to the amount of hybridised RNA Interrogate entire genome in single experiment

11 Microarray Technology
Probes cDNA Oligonucleotides PCR products Design Targeted to genes Tiling (chromosomes, promoters) Fabrication Method Spotted (robotic printing) Photolithography (synthesised in-situ) Type One-colour (log intensities) Two-colour (log ratios) Labelling molecules Cy3 (green), Cy5 (red), biotin

12 Experimental Protocol

13 Microarray Manufacturers
Company Established Main Microarray Technology Human Whole-Genome Array released Headquarters Affymetrix 1992 GeneChip 1994 Santa Clara, CA Illumina 1998 BeadChip 2005 San Diego, CA Roche NimbleGen 1999 High-density tiling arrays Madison, WI Agilent aCGH, ChIP-chip, custom 2004

14 Array design

15 Affymetrix Microarrays
Manufacturing microarrays for >15 years 25bp probes – 11 individual probes comprise a probe-set, signal combined to estimate gene expression Whole human genome array has >50,000 probesets Size array surface 1.28cm2 3’ expression arrays – probes designed to 3’ end of transcript

16 Recent Developments Limitations of 3’ array design
Assumes representative of entire gene Assumes well-defined 3’ end of gene Can’t assess splicing events Can be difficult to distinguish homologous genes Whole transcript arrays 4-probe probesets designed to each exon Gene 1.0 and Exon 1.0 arrays

17 Exon Array Design Picture from Affymetrix

18 Illumina Beadchip Arrays
Beads randomly occupy wells on surface of array 30-40 replicates of each bead type (probe) Longer probe length – typically one probe per gene

19 Applications of Microarray Technology

20 Microarray Applications
ChIP-chip Gene Expression Alternative Splicing DNA Methylation microRNA expression Comparative Genomic Hybridisation SNP Genotyping

21 Gene Expression Still most common use for microarrays
Aim to determine differential expression between groups of samples e.g. disease and control Generate hypotheses about the mechanisms underlying the disease of interest

22 Alternative Splicing Up to 75% of human genes may produce alternative transcripts Increases protein diversity from given set of genes Alternative transcripts from same gene can produce proteins with different, even opposite, functions (e.g. Bcl-x) Role in disease - mutations can disrupt splice sites or splicing machinery

23 Alternative Splicing Affymetrix exon array allows investigation of alternative splicing Custom arrays with junction probes Additional layer of analysis

24 Alternative Poly-A Sites
Alters length of 3’ UTR - may change which target regions for miRNAs are present

25 Alternative Splicing

26 MicroRNAs Small non-coding RNAs (~22bp)
Sequence-specific binding to 3’ UTRs Post-transcriptional gene silencing Picture from He et al. Nature Reviews Cancer 7, (2007)

27 SNP Arrays Illumina and Affymetrix ~6 million SNPs genome-wide
Genotype individuals in high-throughput and cost-effective manner Genome-wide association studies eQTL studies

28 Tiling Arrays Applications so far use arrays with probes designed to genes/miRNAs/SNPs of interest Tiling arrays consist of high-density probes covering a particular region(s) of the genome Identify novel transcripts, exons

29 DNA Methylation Methylation of cytosine bases (CpG islands) in gene promoter regions can silence transcription Epigenetic mechanism Two-colour hybridisation

30 ChIP-chip Method to identify transcription factor binding sites in an unbiased fashion Cross-link protein (TF) of interest with DNA Use immuno-precipitation to pull down DNA fragments bound to the protein (enriched sample) Hybridise with genomic DNA to obtain log-ratio Again looking for large positive ratios

31 Comparative Genomic Hybridisation
Trisomy 13 in female compared to reference male Detect regions of amplification/deletion (copy number changes) Feature of cancer – hybridise sample with reference DNA (copy number=2) Potential dosage effects on genes in affected regions

32 Analysing Gene Expression Data

33 R and BioConductor Powerful, open-source software for statistical analysis and graphical visualisation Greater functionality provided by software packages contributed by researchers BioConductor packages are specifically for genomic data affy limma vsn

34 Analysis Steps Check quality of the data
Decide if any samples are outliers Preprocessing and normalisation Statistical analysis to find differentially expressed genes Tools for biological interpretation

35 Data Quality Looking for good signal and similar metrics across all arrays in experiment (after normalisation between arrays) Poor signal could indicate a hybridisation problem or degraded sample Control probes for hybridisation, labelling and sample can help identify problems

36 Illumina Array Metrics
Average signal Number of detected genes Housekeeping genes signal Biotin controls Hybridisation controls Negative control probe signal

37 Processing Data Background correction
Transform data to log scale (more suitable for statistical analysis) Normalisation between arrays (adjust for systematic differences such as overall brightness) Probe-set summarisation (Affymetrix) or across replicate probes (Illumina)

38 Exploring Data – Boxplots Signal Intensity

39 Exploring Data - PCA

40 Outlier Samples Potential outlier samples will look different to others in the experiment No definitive rules to decide when to exclude a sample from analysis Depends on size of experiment Can be useful to run analysis with and without outlier to assess effect on results Always re-normalise data excluding any outlier samples before proceeding

41 Outlier Sample

42 PCA indicating outlier sample

43 Filtering Lose data but signal from low intensity probes is noisy and can give false positives Detection p-values calculated for each probe based on overlap of signal with negative control probe signal distribution Criteria Detected in all samples/at least one sample Detected in at least one group

44 Detecting Differentially Expressed Genes
Linear Models for Microarray Analysis (limma) Handles analysis of simple and complex experimental designs For two-group comparisons, analogous to t-test, otherwise ANOVA Uses information from all genes to estimate variance Reduces chance of false positives from very low variance genes More robust for small sample sizes

45 limma Fits linear model for each gene
Test whether slope = 0 for each gene and assign p-values Multiple testing correction - FDR Group 1 Group 2 Log normalised intensity

46 Effect of other variables
Wt and Mut groups Three different litters Top gene ~ 5x higher expression in Wt compared to Mut Similarly expressed across litters in both genotypes

47 Strong litter effect Overlap between groups
Within litters, consistent pattern of higher expression in WT vs Mut Within genotypes, B>C>A – expression depends on litter Accounting for this variance increases power

48 Limma Output

49 Limma Output Small sample size and subtle effects can mean no probes would be considered statistically significant Ranked in order of evidence for differential expression – can still be explored Biological interpretation can be most difficult step – tools available

Download ppt "Microarrays and Gene Expression"

Similar presentations

Ads by Google