2 Outline Overview of transcription Construction of cDNA libraries cDNA sequencingExpression analysis via SAGEMicroarray construction and their use in expression analysis.
3 We can isolate mRNA and convert it to a stable form (cDNA) cDNA’sIsolate,Reverse Transcribe,labelWe can isolate mRNA and convert it to a stable form (cDNA)The “Central Dogma” of Molecular BiologyDNAmRNAprotein
4 Genome in numbers Nucleic acid content of an average human cell Abundance distribution of mRNA species in a typical mammalian cell
12 cDNA libraryIdeally containing at least one copy of every expressed geneProbablity for the above is a function of:fragment size – the longer the more likely to find gene representedgenome size – smaller genome = increased chance to find gene representedexpression – high expression = high likelihood to find gene representedFor 99% probability, a mammalian cDNA library requires to contain ~800,000clones
13 cDNA sequencingThe advent of cDNA cloning combine with the creating of automated sequencers led to efforts to sequence the entire human transcriptome and to create arrays (on filters) of cDNAs (see reading materials).cDNA sequencing was viewed as the fastest way to get at the coding portion of the genome.Numerous companies sprung up to sequence and patent cDNA’s.cDNA sequencing was also used to measure gene expression levels.
14 cDNA sequencing --> expression analysis Expression level estimates:Count the number of occurrences of a given cDNA sequence in a given library - highly expressed genes will have been sequenced more often.Use the above (in combination with the total number of sequences in the library) to estimate expression level.
16 Web based expression analysis - www.pedb.org counting cDNA frequency
17 Serial Analysis of Gene Expression (SAGE) ConceptcDNA sequencing is expensiveCan uniquely identify most mRNA species by a short sequence in a defined location in the gene (9bp tags are unique 95% of the time)If we could produce a library of short sequences and ligate them together, then we could sequence the ligated DNA to measure the concentration of gene more efficiently
18 SAGE diagram Sequence these -> Linker: Primer A/B - TypeII site – Type I siteABBABAPrimer APrimer BPrimer APrimer BSequence these ->
19 Issues with SAGE (and cDNA sequencing for expression analysis) Low abundance clonesSAGEin 1995, the estimate was that characterization of genes representing <100mRNA’s/cell would take a few months of work to quantify by a single in investigator (maybe 10 times quicker today)Cost - if we assume even a low estimate of $6/sequencing reaction, 96 lanes * 4 runs/day*30 days * $6 = $69,000 to measure 460,000 tags (assume 40 tags/run).cDNA sequencingSame problem costs/time maybe times higherHence expression information about low abundance clones is not accurate in cDNA or SAGE data in most cases.Leading to the advent of arrays…..
21 Taking advantage of DNA hybridization On the surfaceIn solutionAfter Hybridization4 copies of gene A,1copy of gene BABAB
22 DNA ArraysSpots of DNA arranged in a particular spatial arangement on a solid supportSupports - Filters(nylon, nitrocellulose), glass, siliconTypesSpotted or placed - pre-synthesized DNA put onto a surfaceSynthesized - DNA synthesized directly on the surface
23 The Original DNA Array Petri dish with bacterial colonies Apply membrane and lift to make a filter containing DNA from each clone.Probe and image to identifyClones homologous to the probe.
24 Vicki - A manual Gridding tool Gridding tool modifications by : Michèl Schummer
25 Vicki and the gridding frame Frame Design by: Michèl Schummer
27 Types of filter based arrays PCR products - ORFs or cDNAsOligos - some times but generally not used for short products - oligos do not immobilize well on membranesLiving clonesPlace membrane on Whatman paper soaked in media, can grow colonies directly on the arraysLysis of the colonies followed by cross-linking produced DNA arraysGood for screening large libraries
28 Uses for Filter Based Arrays In general, filter based arrays were in vogue about 8-13 years ago in the pre-genomic days.Typically cDNA libraries were spotted as clones and the arrays were used to perform comparative expression analysis.Detection was typically performed with radioactive labeling/film or phosphorimaging.“Interesting clones” were identified (via differential expression) and then sequenced.For genomes that have not yet been sequenced, this can still be a cost effective approach, but rapid sequencing is changing that.
29 Selected cDNA arraysWith unselected cDNA libraries, clones for highly expressed genes are over represented on the arrays.As time progressed a large number of cDNA’s were sequenced and hence it became possible selected unique cDNA’s and to make arrays on which each spot represented a single gene.Around the same time, coatings for glass were developed that retained spotted DNA well.This allowed for arrays to be produced on glass microscope slides which in turn allowed for fluorescence based detection technology.
31 Spotted Arrays Spotting “pen” Reactive surface or coated surface Drop containingDNA in solutionCAGTCAGTReactive surface or coated surface
32 MD GenIII Arrayer Plate hotel holds twelve 384-well plates Gridding head,12 pinsSlide holder36 slidesFeatures:36 slides in 8 hours7680 genes spotted in duplicateBuilt-in humidity control
33 Scan Cell Population #1 Cell Population #2 Glass slides enabled fluorescent detection in 2(or more) colorsExtract mRNAExtract mRNAMake cDNALabel w/ Green FluorMake cDNALabel w/ Red FluorScanCo-hybridize……………………….……………………….……………………….……………………….……………………….……………………….Slide with DNA fromdifferent genes
34 Spotted arraysInitially, most spotted arrays were produced by spotting PCR products produced from selected cDNA clones.IssuesMust have the libraries in handMust not mix clones upMust perform high throughput PCR to produce DNA to spot (again without mixing things up).LOTS of freezer space to store everythingcDNA’s are long and cross hybridization is a problem (although it is possible to spot oligo’s)Quality manufacturing is difficult to maintain.
35 Oligo ArraysSynthesized or spotted arrays of short oligos of chosen sequence. (typically base pairs)Synthesis methods - ink jet, light directed.Spotting using reactive coupling.Used for re-sequencing, genotyping, diagnostics and expression arrays.MUCH better than cDNA arrays to distinguish related sequencesOnly have to store the DNA’s OR (better yet) if you synthesize DNA directly on the surface, you only need to store the sequence information (and a few reagents)
36 Basic Oligo Synthesis + + + Coupling Remove Protecting Group BaseBaseBase+CouplingBasePPGlassSupportGlassSupportRemove Protecting GroupThe protecting group in the standard chemistry is a DMT and is removable.convertible to OH by treatment with acid. The surface attachment is necessary to allow unreacted reagents to be washed away prior to removing the next protecting group. When you order oligos, this is ho they are synthesized and the surface is glass beads that are retained in a column.BaseBaseBase+Add Next NucleotideBaseBaseP+PPGlassSupportGlassSupport
37 Ink-jets Can be Used to Direct Small Volumes of Liquids to Specific Sites
38 Agilent InkJet Array Technology Resistor OffLiquidVaporizesGasExpandsResistor OnDropBreaks OffReservoirRefillsFill< 1 msec~ 44,000 Features on 1”x3” SlideInkjet microarray manufacturing process which uses standard high efficiency phosphoramidite chemistry provides Agilent precision printing with Uniform feature morphology.This flexibility and precision allows us to significantly increase the number of probes printed on a standard microarray. We have increased our layout from 22K to 44K.If, instead of using ink, one fills the reservoirs with different nucleotides, inkjets can be used to make DNA on a surface
39 Glass Can be Treated to Produce Hydrophilic “Wells” By pre-treating the surface to create hydrophilic regions surrounded by a hydrophobic surface, small reaction “wells” can be created. This lowers the requirement for alignment of the inkjet nozzles and also helps to confine the droplets applied to the surface.
40 Agilent Printing Facility Same chemistry as on ABI synthesizer happens in tiny little droplets inkjetted onto the arrays to build up the oligos one base at a time. We have quite a lot of flexibility on the number of features and arrays per slide. This photo shows the latest format we’ve developed which contains 44K features on 1x3” slide.
41 Light-directed oligo synthesis The key to this synthesis method was the development (by Steve Fodor) of photolabile protecting groups. This development allows for photolithographic technologies to be applied to chemical synthesis.
42 Number of different DNA sequences as a function of photolithographic resolution
44 Affymetrix PlatformEach gene is represented by 11 probe pairs of 25 bp oligosEach probe pair contains a perfect match and a mismatch to the gene sequenceTarget sample is labeled with a biotinylated nucleotide and detected via a streptavidin-phycoerythrin conjugateOne sample per array, one-color data
45 Affymetrix Expression Data Data from the 11 probe pairs are used to calculated an aggregate signal for each gene
46 Strategies For Array Design Known ExonsUnknown transcriptSurrogate StrategyMost expressionarrays to dateAnnotation StrategyExon arraysSplice variantsShift in thinking about how to interrogate the genome. Chip capacity allows for unbiased approach. This also compares and contrasts previous strategies (expression arrays) with currently possible approaches (Exon and tiling). Only an unbiased approach as used in tiling arrays investigates previously unannotated regions of the genome and therefore enables new discovery of binding sites or transcribed regions.This is a paradigm shift of how people look at the genome. Previously one had to rely on predictions and annotations.Tiling strategyUnbiased lookat the genome
47 Affymetrix Platform Expression arrays Exon arrays Mapping arrays Human, Mouse, Rat, Yeast, E. coli, Drosophila, C. elegans, Dog, Soybean, Plasmodium, Anopheles, Pseudomonas, Arabidopsis, Zebrafish, Xenopus, etc.Exon arraysAlternative splicing patternsMapping arraysSNP analysis, loss of heterozygosityTiling array setsTranscript mappingCustom arrays
48 Issues with synthesized oligos Repetitive yield - e.g. for each reaction cycle, what percentage of the oligos react as intended - estimated at 95% for light directed method, 98-99% for ink jet method(0.95)20 = 35.8%, (0.98)20 = 67% - net result- Affy arrays are usually 25-mers, ink jet arrays are usually 60mers.For a single oligo, it can be shown that sensitivity plateaus at 50-70bp.
49 Relative merits of different methods of making oligo arrays Affy:available first, large catalogue, small feature size possibleInkjet:much more flexible to designSpotted:less practical for large numbers (>a few 100) of oligo’s, can be made with std. spotting equipment. Libraries of oligos exist for more common organisms, so oligo deposition is feasible for some organisms.
50 Illumina’s Bead Arrays ACGTGTCTACAGTStep 1 - synthesize beads inbatches each batch with asequence on it. Generally, colorcode the beads to keep track ofwhich one has what molecule on it.TGCATCAGTGCACGTGTATGCATGTTGCATCAGTGCAATGCACTGTAGTStep 2 - Etch the ends of optical fibers in a bundle or circular spots on a glass slide to create bead sized depressions.
51 Illumina’s Bead Arrays (cont) ACGTGTCTACAGTStep 3 - Allow beads to selfassemble an array on the endof the fibers or on the surfaceTGCATCAGTGCATGCATCAGTGCACGTGTATGCATGTATGCACTGTAGTThese self assembled arrays can be used for the same applications as other DNA arrays.Since the assembly is random, one must over represent each desired oligo 10’s of times to assure that each oligo is represented at least n times on the array.Decoding can also be accomplished by hybridizing short labeled oligos to the oligos on each bead. In practice, this is how it is usually done.We will discuss Illumina technology in more detail later in the course in the context of genotyping.See
52 Detection technologies Radio labeled probesFilm or phosphorimagersBiotin labledPost hyb with SA labeled with a fluor or an enzymeFluorescent probesconfocal scanning
53 Scanning with a confocal microscope The dichroic mirror preferentially reflects laser light but allows the fluorescence to pass through. By using pin-hole near the detector, the depth of field is controlled (e.g. the detected fluorescence comes from very near the surface of the slide.
55 Prepare Fluorescently to identify patterns of gene expression 2- color Microarray OverviewMeasureFluorescencein 2 channelsred/greenPrepare FluorescentlyLabeled ProbesControlTestHybridize,WashAnalyze the datato identify patterns of gene expressionSlide from John Quackenbush, Dana Farber
56 Prepare Fluorescently to identify patterns of gene expression 1-color Microarray OverviewMeasureFluorescencein 1 channelWeedControlHybridize,WashTestPrepare FluorescentlyLabeled ProbesAnalyze the datato identify patterns of gene expressionBushSlide adapted from John Quackenbush, Dana Farber
57 2-color vs. single color2-color was originally designed due to problems in making reproducible arrays - e.g. the ratio on a spot is more reproducible than the absolute intensity if the spot size/concentration changes from array-to-array.With 2-colors, you don’t necessarily get twice as much data since it is typically to run an extra array in the inverted color scheme.Experimental design and cross experiment comparisons are much more complicated with 2-color arrays.
58 Expression Arrays are a Natural Extension of Genomic Analysis Genome studies provide the source material for the arrays - eg. clones or manufactured DNA’s.For completely sequenced genomes, arrays allow a comprehensive survey of gene expression.This level of analysis is a revolution in biology.
59 Expression Arrays Have a Broad Range of Applicability Cancer Studies - tumor vs. normal.Infectious disease studies - host response infection, infectious agent gene expression, viral diversity.Pharmaceutical studies - drug treated vs. non-treated.Environmental - microbial diversity, effects of toxins, effect of growth conditions.
60 Expression Arrays Have a Broad Range of Applicability Gene specific studies - deletion (“knockout”) vs. normal, over expression vs. normal.Agricultural studies - effects of pesticides, growth conditions, hormones.Developmental biology - cells from different areas/stages of developing organismsMany others - any two samples of interest can be compared.
61 Challenges for Planning Good Array Experiments Experimental DesignReplicates are necessary and expensiveA simple experiment may not give a simple answerWhat comparisons should be made?Data AnalysisHow will differentially expressed genes be identified?How will errors be estimated?What software does this best?How will the data be mined?
62 Where are arrays going?As sequencing gets cheaper and cheaper, most assays that are currently done by arrays can be done more effectively by sequencing. Hence, the analytical use of arrays will be replaced by sequencing.However, arrays can also be used to enrich for specific genomic regions upstream of sequencing or can be used to create many sequences for the artificial production of genomes or genomic regions.