Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microarrays and Gene Expression DTC Bioinformatics Course 9 th February 2010 Helen Lockstone.

Similar presentations

Presentation on theme: "Microarrays and Gene Expression DTC Bioinformatics Course 9 th February 2010 Helen Lockstone."— Presentation transcript:

1 Microarrays and Gene Expression DTC Bioinformatics Course 9 th February 2010 Helen Lockstone

2 Overview Background Array design Applications of array technology Steps in data analysis Finding differentially expressed genes Biological interpretation

3 Schedule TimeTopic Introduction to microarray technology and applications Break Microarray data analysis Practical Lunch Biological interpretation Break Practical 2

4 Microarrays in the Literature

5 The Central Dogma Transcriptome measured by microarrays

6 Premise of Microarrays Compare gene expression between groups Differentially expressed genes may provide some biological insight But not magical solutions!

7 Typical Microarray Designs Disease vs control Good prognosis vs poor prognosis Different tumour types Effect of treatment Effect of stimulus Time course Different tissues/stages of development

8 Criticism of Microarrays Non-hypothesis driven “fishing expeditions” Because microarray experiments are expensive and time- consuming to interpret, often published as a stand-alone experiment Produce large amounts of data, interpretations can be very different (but equally valid) Further experimental work, following up hypotheses suggested from array data, can produce elegant studies Perception that data is unreliable – validation

9 Microarray Repositories GEO – ArrayExpress - Excellent resource of microarray data MIAME guidelines

10 What is a Microarray? Glass slide consisting of hundreds of thousands of probes arranged in grid layout Each probe detects a particular RNA species (transcript) Hybridisation occurs by complementary base-pairing Make quantitative measurements – signal from each probe is proportional to the amount of hybridised RNA Interrogate entire genome in single experiment

11 Microarray Technology ProbescDNA Oligonucleotides PCR products DesignTargeted to genes Tiling (chromosomes, promoters) Fabrication MethodSpotted (robotic printing) Photolithography (synthesised in-situ) TypeOne-colour (log intensities) Two-colour (log ratios) Labelling moleculesCy3 (green), Cy5 (red), biotin

12 Experimental Protocol

13 Microarray Manufacturers CompanyEstablishedMain Microarray Technology Human Whole- Genome Array released Headquarters Affymetrix1992GeneChip1994Santa Clara, CA Illumina1998BeadChip2005San Diego, CA Roche NimbleGen 1999High-density tiling arrays Madison, WI Agilent1999aCGH, ChIP- chip, custom 2004Santa Clara, CA

14 Array design

15 Affymetrix Microarrays  Manufacturing microarrays for >15 years  25bp probes – 11 individual probes comprise a probe-set, signal combined to estimate gene expression  Whole human genome array has >50,000 probesets  Size array surface 1.28cm 2  3’ expression arrays – probes designed to 3’ end of transcript

16 Recent Developments Limitations of 3’ array design –Assumes representative of entire gene –Assumes well-defined 3’ end of gene –Can’t assess splicing events –Can be difficult to distinguish homologous genes Whole transcript arrays –4-probe probesets designed to each exon –Gene 1.0 and Exon 1.0 arrays

17 Exon Array Design Picture from Affymetrix

18 Illumina Beadchip Arrays Beads randomly occupy wells on surface of array replicates of each bead type (probe) Longer probe length – typically one probe per gene

19 Applications of Microarray Technology

20 Microarray Applications Gene Expression Alternative Splicing microRNA expression SNP Genotyping DNA Methylation ChIP-chip Comparative Genomic Hybridisation

21 Gene Expression Still most common use for microarrays Aim to determine differential expression between groups of samples e.g. disease and control Generate hypotheses about the mechanisms underlying the disease of interest

22 Alternative Splicing  Up to 75% of human genes may produce alternative transcripts  Increases protein diversity from given set of genes  Alternative transcripts from same gene can produce proteins with different, even opposite, functions (e.g. Bcl-x)  Role in disease - mutations can disrupt splice sites or splicing machinery

23 Alternative Splicing Affymetrix exon array allows investigation of alternative splicing Custom arrays with junction probes Additional layer of analysis

24 Alternative Poly-A Sites Alters length of 3’ UTR - may change which target regions for miRNAs are present

25 Alternative Splicing

26 MicroRNAs Small non-coding RNAs (~22bp) Sequence-specific binding to 3’ UTRs Post-transcriptional gene silencing Picture from He et al. Nature Reviews Cancer 7, (2007)

27 SNP Arrays Illumina and Affymetrix ~6 million SNPs genome-wide Genotype individuals in high-throughput and cost-effective manner Genome-wide association studies eQTL studies

28 Tiling Arrays Applications so far use arrays with probes designed to genes/miRNAs/SNPs of interest Tiling arrays consist of high-density probes covering a particular region(s) of the genome Identify novel transcripts, exons

29 DNA Methylation Methylation of cytosine bases (CpG islands) in gene promoter regions can silence transcription Epigenetic mechanism Two-colour hybridisation

30 ChIP-chip Method to identify transcription factor binding sites in an unbiased fashion Cross-link protein (TF) of interest with DNA Use immuno-precipitation to pull down DNA fragments bound to the protein (enriched sample) Hybridise with genomic DNA to obtain log-ratio Again looking for large positive ratios

31 Comparative Genomic Hybridisation Trisomy 13 in female compared to reference male Detect regions of amplification/deletion (copy number changes) Feature of cancer – hybridise sample with reference DNA (copy number=2) Potential dosage effects on genes in affected regions

32 Analysing Gene Expression Data

33 R and BioConductor Powerful, open-source software for statistical analysis and graphical visualisation Greater functionality provided by software packages contributed by researchers BioConductor packages are specifically for genomic data –affy –limma –vsn

34 Analysis Steps Check quality of the data Decide if any samples are outliers Preprocessing and normalisation Statistical analysis to find differentially expressed genes Tools for biological interpretation

35 Data Quality Looking for good signal and similar metrics across all arrays in experiment (after normalisation between arrays) Poor signal could indicate a hybridisation problem or degraded sample Control probes for hybridisation, labelling and sample can help identify problems

36 Illumina Array Metrics Average signal Number of detected genes Housekeeping genes signal Biotin controls Hybridisation controls Negative control probe signal

37 Processing Data Background correction Transform data to log scale (more suitable for statistical analysis) Normalisation between arrays (adjust for systematic differences such as overall brightness) Probe-set summarisation (Affymetrix) or across replicate probes (Illumina)

38 Exploring Data – Boxplots Signal Intensity

39 Exploring Data - PCA

40 Outlier Samples Potential outlier samples will look different to others in the experiment No definitive rules to decide when to exclude a sample from analysis –Depends on size of experiment –Can be useful to run analysis with and without outlier to assess effect on results –Always re-normalise data excluding any outlier samples before proceeding

41 Outlier Sample

42 PCA indicating outlier sample

43 Filtering Lose data but signal from low intensity probes is noisy and can give false positives Detection p-values calculated for each probe based on overlap of signal with negative control probe signal distribution Criteria –Detected in all samples/at least one sample –Detected in at least one group

44 Detecting Differentially Expressed Genes Linear Models for Microarray Analysis (limma) Handles analysis of simple and complex experimental designs For two-group comparisons, analogous to t-test, otherwise ANOVA Uses information from all genes to estimate variance –Reduces chance of false positives from very low variance genes –More robust for small sample sizes

45 limma Fits linear model for each gene Test whether slope = 0 for each gene and assign p-values Multiple testing correction - FDR Group 1Group 2 Log normalised intensity

46 Effect of other variables Wt and Mut groups Three different litters Top gene ~ 5x higher expression in Wt compared to Mut Similarly expressed across litters in both genotypes

47 Strong litter effect Overlap between groups Within litters, consistent pattern of higher expression in WT vs Mut Within genotypes, B>C>A – expression depends on litter Accounting for this variance increases power

48 Limma Output

49 Small sample size and subtle effects can mean no probes would be considered statistically significant Ranked in order of evidence for differential expression – can still be explored Biological interpretation can be most difficult step – tools available

Download ppt "Microarrays and Gene Expression DTC Bioinformatics Course 9 th February 2010 Helen Lockstone."

Similar presentations

Ads by Google