Presentation is loading. Please wait.

Presentation is loading. Please wait.

Technology and Methods Seminar

Similar presentations


Presentation on theme: "Technology and Methods Seminar"— Presentation transcript:

1 Technology and Methods Seminar
Tiling Arrays Technology and Methods Seminar March 29, 2007 Before I start, I want you to know that I will post a link to this presentation on my wiki page, so you can access it later. Norman Pavelka Postdoc Rong Li Lab Madelaine Gogol Programmer Analyst Microarray

2 Tiling Arrays - Overview
What is a tiling array? What can I do with it? ChIP-chip CGH Expression Which tiling arrays are available for my experiments? in-house yeast tiling array (design details) Agilent Affymetrix How will we analyze and visualize the data?

3 What is a Tiling Array? Probe 1 Gene 1 Microarrays are basically DNA probes on glass slides. So when you make a microarray, first you have to decide what probes you want to include. Probes on some arrays are biased toward known or predicted genes. Tiling arrays instead blanket a genome to get as much information as possible. Probes can be tiled regularly without consideration of probe properties or an attempt can be made to pick “good” probes, that are unique and share properties with eachother. Probes can be tiled in different resolutions. A microarray with many probes distributed in an evenly spaced way across an entire genome.

4 What can I do with tiling arrays?
Map the transcriptome what’s being expressed? ChIP-chip where are proteins binding? CGH what are the differences in genome structure? Other possibilities Map the methylome Genome resequencing Polymorphism discovery I’m going to talk about a few things people have done with tiling arrays, including expression, ChIP-chip, and CGH.

5 Here’s an interesting paper showing an experiment using affy yeast tiling arrays. I’ll talk about these arrays later – they have 25mer probes on both strands, at high density overlapping by 5 bp. These images are of different regions of the genome, and each dot represents a probe, with the height of the dot indicating expression level. Here’s some examples of the cool things they saw – A) Splicing, D) two ORFs trancribed together, F) unannotated transcription G) antisense transcription. So this is one type of experiment you could do with tiling arrays. Next I’m going to talk about chip-chip.

6 ChIP-chip In ChIP-chip, you take a DNA-binding protein, fix it to the dna, cut up the dna, put an antibody to the protein... Hybridize to Microarray PCR w/aa-dNTP Analyze image, calculate ratios

7 array CGH Hybridize to Microarray
array CGH starts with genomic DNA. Cut it up, hybridize it to an array with DNA from another strain or treatment, and detect copy number changes of whole chromosomes or certain regions of the genome. This is the type of experiment Norman will talk about later. So now that I’ve told you a few things you can do with the arrays, I’ll tell you which platforms are available.

8 In-house yeast tiling array (YOGie)
Covers the yeast genome Just printed resolution ~ 250 bases freely available First I’m going to talk about our in-house yeast tiling array. I designed a lot of the probes on this array. It was just printed in a marathon 17 hour printing session by Brian. The resolution is about 250 bases between probes, and it’s freely available to you.

9 Spotted Microarray Manufacturing
First a brief description of how we manufacture arrays here. We start by designing DNA sequences that are 70 nucleotides long. We order the DNA from an oligo production company, and get back DNA in a bunch of plates. Brian uses the arrayer to print this DNA onto glass slides. Here’s a close up of the print head, a bunch of steel or silicon pins that dip into the plate and then spot the DNA down onto the glass. 261 slides are printed at once.

10 Operon Probe Set 6307 probes, length 70
Designed one per ORF, near 3’ end. YOG arrays (yeast oligonucleotide) YBOX: 3072 new probes I’m going to describe the probe sets that make up the new array. The yeast operon probe set is a commercially available set of 6307 probes, designed to have one probe per ORF, near the 3’ end. These probes were used for the YOG arrays, and are now also included on what we call the YOGi arrays and the new enhanced YOGis (which I’ll describe momentarily). I want to add that we now have an additional set called “YBOX” (Yeast Brown lab oligo extension) designed by the Brown lab at stanford. It contains mer oligos that enhance the operon set. It adds non-coding RNAs, newly characterized ORFs, improved probes for low seq. uniqueness, all yeast introns > 40 bp, gene tiling probes, centromeric/telomeric/intergenic elements. The YBOX set was recently printed on the YOG1B arrays. Now I’m going to talk about some probes I designed to add to this operon set to fill out our homemade yeast tiling array. Even if you’re not specifically interested in yeast, probes on every array have been designed through some similar process, so it might be interesting to know the general procedures. The first set we added to the operon set was the intergenic set.

11 Array Oligo Selector (AOS)
Intergenic Probe Set: Design Design target: yeast Intergenic regions original goal: leave no area > 500 uncovered Gene 1 Gene 2 Fasta format 140-mer sequences tiling the intergenic regions Array Oligo Selector (AOS) 9, mer sequences from the intergenic regions Within a month of my hire to the institute (December 05), I designed this set. Here’s a few details about the design process. We wanted to add probes in the intergenic regions, in a tiling pattern. We have software which given a sequence, selects the best probe. So probe design really becomes – what regions do I feed the probe design software? In this case, I chose 140-mers evenly spaced throughout each intergenic region. Then I told the software to pick a probe in each one. For this design, we used open source software called Array Oligo Selector, which takes into account uniqueness in genome, sequence complexity, self binding, and melting temperature when selecting probes. Array Oligo Selector Open source software Zhu, Bozdech, and DeRisi -BLAT server uniqueness in genome -Sequence complexity - difference in bytes between the oligo sequence and the its compressed version by the LZW compression algorithm -Self binding - oligo sequence and its reverse compliment using the Smith-Waterman algorithm (alignment score of the optimum local alignment) - GC percentage (user-specified limit)

12 Intergenic Probe Set 9405 70-mer probes
No region greater than 360 left uncovered ~ 220 bases between probes on average Chromosome 3 completely tiled YOGi arrays (YOG+intergenic) Here’s some details on the resulting probe set. In the end, I tiled a little closer than 500, with no region greater than 360 uncovered, and on average about 220 bases between probes. Also, we completely tiled chromosome 3 at this resolution, ignoring any gene boundaries. This intergenic set was printed on our YOGi arrays, which include the operon set + this intergenic set.

13 Array Oligo Selector (AOS)
5’ probe set: Design Goal: fill in gaps left by operon set in 5’ region of gene leave no region > 500 without a probe Targets: ORF regions 5’ of operon probe Array Oligo Selector (AOS) This year we decided to fill in the gaps left in the 5’ region of the gene by the operon set. The goal was again to leave no region > 500 without a probe. We improved on the intergenic design by overlapping the target regions. So the basic procedure looked something like this: Take the gene coordinates and the coordinates of the operon probe, grab the section 5’ of the operon probe and divide it into targets, give those to AOS to design probes. operon probe 5’ 3’ Gene 1

14 5’ probe set: target selection
What size targets? 355mers that overlap by 70 500 (500-70)/2 = 215 One question was – what size targets should I choose? This time, I had the same constraint of no area > 500 uncovered. Additionally, I wanted to overlap the targets, so that if the best probe were really found in the overlap, it could still be selected. Let’s say I have two design targets and I want to pick the best probe in each one. I want the design to be such that there isn’t more than 500 bases between probes and I want all areas of the genome to have the chance to become probes, so the targets overlap by 70. To figure out the length of the targets, I just want to know these numbers (click). These are equal to /2, or 215. Then I can just add up = 355. So the targets should be of length 355. =355 70 target 1 target 2 215 215

15 5’ probe set: Reduction Too many probes
reduce to 6666 or less (budget and printing constraints) Probes within 260 bases of eachother winnowed Tm binding energy number of matches to genome So I used overlapping targets of 355, but unfortunately that resulted in too many probes – over Due to budget and printing constraints, we didn’t want to order that many, so I decided to just do a second pass and remove some of the probes. I went for probes that were the closest together and removed the “worse” probe from each pair. operon probe 5’ 3’ Gene 1

16 5’ probe set 6, mer probes tiles the region of each gene between the operon probe and the 5' end. YOGie arrays (YOG+intergenic+enhanced) And here’s some details about the final set – 6512 probes that tile genes between the operon probe and the 5’ end of the gene. These probes were recently printed on our new enhanced YOGie arrays.

17 In-house yeast tiling array: YOGie
Together, the operon, intergenic, and 5’ sets make up our homemade yeast tiling array Freely available Also includes tight tiles of all centromeres 7 sub-genomic regions kb So in general, our in-house tiling arrays are made up of the operon, intergenic, and 5’ enhancement set. These arrays are freely available, and also include some regions of really tight tiling, of centromeres, and 7 various sub-genomic regions. Next I’m going to talk about the commercially available tiling array platforms from agilent and affymetrix.

18 Agilent Microarray Manufacturing
Here’s some images describing agilent’s array manufacturing process. Agilent was a spin off of HP, and they basically print the probes one base at a time. This means that the process is by nature very customizable and is not tied to a certain pre-determined set of probes. Agilent generally uses 60-mers.

19 Agilent Tiling Arrays ChIP-on-chip Arabidopsis Whole Genome
C. elegans Whole Genome Drosophila Whole Genome Human CpG Island Human ENCODE 244K Human Promoter Mouse Promoter Yeast Whole Genome 4 x 44K Yeast Whole Genome 244K Zebrafish Expanded Promoter Zebrafish Proximal Promoter Custom ChIP-on-chip Oligo aCGH Human Genome 244K Human Genome 105K Human Genome 44K Mouse Genome 244K Mouse Genome 105K Mouse Genome 44K Agilent has separate arrays for CGH and ChIP, which is apparently due to various controls that are on the different arrays, but it’s not completely clear why. Not all of these are strictly tiling arrays. Some of these chip-chip arrays are focused on the pre-determined promoter region and so aren’t compeletely unbiased. But again, any of these can be customized and changed, which is nice.

20 Agilent Tiling Arrays: formats and cost
per slide per hyb 244k 1 $400 $400 105k 2 $640 $320 44k Here’s the stowers negotiated prices for the various agilent array formats. Agilent prints multiple arrays on a single slide, and each array can perform a different experiment. So the prices here are shown per slide and per hybridization. 4 $720 $180 15k 8 $800 $100

21 Agilent Custom Array Design
Take an agilent microarray design Remove some probes Put in your own probes Design using agilent’s web application, earray. You can also design everything from scratch As I’ve mentioned, one nice feature of agilent arrays is the customization. Basically, you can take any standard agilent array design and swap out probes for your own or design from scratch. We’ve already swapped probes twice – once for the Jaspersen lab and once for Proteomics and we are planning more experiments using custom arrays.

22 earray To do the custom array designs and upload probes, you have to use agilent’s web application, earray. I say have to because it’s really slow, you have to use internet explorer, and it’s not intuitively designed. Accounts to earray are by invitation only – If you really want an account you can chris or me. But if you want to do a custom design, just come talk to us and I’ll deal with it for you.

23 Affymetrix Microarray Manufacturing
Here’s an image from affy describing their manufacturing process. I won’t describe it too much except to say their density is unsurpassed. Their probes are typically 25 nucleotides, and they can put millions of spots on an array instead of thousands. “The photolithographic process begins by coating a 5″ x 5″ quartz wafer with a light-sensitive chemical compound that prevents coupling between the wafer and the first nucleotide of the DNA probe being created. Lithographic masks are used to either block or transmit light onto specific locations of the wafer surface. The surface is then flooded with a solution containing either adenine, thymine, cytosine, or guanine, and coupling occurs only in those regions on the glass that have been deprotected through illumination. The coupled nucleotide also bears a light-sensitive protecting group, so the cycle can be repeated. In this way, the microarray is built as the probes are synthesized through repeated cycles of deprotection and coupling. The process is repeated until the probes reach their full length, usually 25 nucleotides. Commercially available arrays are typically manufactured at a density of over 1.3 million unique features per array. Depending on the demands of the experiment and the number of probes required per array, each quartz wafer can be diced into tens or hundreds of individual arrays.”

24 Affymetrix Tiling Arrays
Arabidopsis Tiling 1.0R Array C.elegans Tiling 1.0R Array Chromosome 21/ Array Set Chromosome 21/22 2.0R Array Drosophila Tiling 1.0R Array ENCODE Array Human Genome Arrays + Mouse Genome Arrays + S. cerevisiae Tiling 1.0R Array S. pombe Tiling 1.0FR Array Cost ~ 500$ per array, so 500$ per hyb. Here’s affy’s available tiling arrays. Keep in mind, with affy, no real customization is available. Also they cost around 500$ per array.

25 Summary of Tiling Arrays (only yeast shown)
Type #spots per array probe size resolution per slide cost per array cost customizable? Homemade YOGie arrays 26,000 70 250 between some effort Agilent 44K 44,000 (4) 60 160 between $720 $180 easily Agilent 244K 244,000 overlap by 18 $400 Affy 3,200,000 (pm/mm) 25 overlap by 20 $500 no Here’s a summary table regarding the various array options – note, this is only for yeast. We have homemade arrays available for other organisms, come talk to us. Now I’m going to move on to talk about data analysis for tiling arrays. 250-median resolution

26 Same ChIP-chip experiment – agilent and original YOGi.

27 Tiling array data analysis: still an adventure
Affy TAS (Tiling Analysis Software) Agilent ChIP Analytics & CGH Analytics Other genome browsers, R packages, other people’s software, Do-it-yourself, perl, statistical models Microarrays themselves already produce a lot of data. Tiling arrays by their very nature produce much more information. Different analysis methods exist and are being developed. From an analysis standpoint, it takes us time to figure out the best methods. It also depends on what questions you’re asking in your experiment. According to Chris, the affy tiling analysis software ran for 8 hours per experiment for Norman’s data. ChIP analytics is worse than e-array, in that I have not been able to run it successfully, but I’ve been told they’re releasing a patch, so it could be an option in the future. For analysis beyond commercial methods, we have done a variety of things – including other people’s software and our own methods using R or perl. There’s a lot of methods and software packages that people have developed that we continue to try out.

28 Data Visualization: Genome Browsers: UCSC
A very useful tool for data visualization is a genome browser. This is ucsc genome browser. Here’s a wiki page that has more information about genome browsers in general.

29 Data Visualization: Genome Browsers: UCSC
Here’s an example of how ChIP-chip data looks on a genome browser. The up and down values are the log2 ratios of the intensity of the immunoprecipitated DNA versus the whole cell extract. Genome browsers show your data in context with the rest of the genome, genes, repeats, and everything. They are useful for visual inspection, sanity checks, and eyeball data comparison. I use them all the time.

30 Data Visualization: Genome Browsers: IGB
Here’s some CGH data from the affy yeast tiling array shown on another genome browser – IGB, integrated genome browser – a java software application started by affy, now open source. There’s a wiki page on this too.

31 Data Analysis: Sliding windows
One intuitive thing to do with ChIP-chip data is find peaks. You can do this using a sliding window approach that slides across the chromosome calculating average log ratio in a window and then defining which regions are unusually high, calling them peaks. There are a few ways people have come up with to do this. There are also some publications that say sliding windows can miss certain things and I’ve been reading about alternative statistical modeling methods to define peaks. ChIPOTle, PeakFinder, R scripts, etc.

32 Data Analysis: Annotating and comparing peaks
This web application called galaxy is useful for intersecting genomic regions – doing simple intersections and subtractions of intervals. It’s pretty easy to use and intuitive, so you might want to take a look at it if you are ever dealing with overlapping genomic coordinates. It interfaces nicely with the UCSC genome browser and database.

33 Data Analysis: Average Gene analysis
Profile of binding across an average gene This is an example of a figure I made from Bing Li’s ChIP-chip data. It was based on a paper from Rick Young’s lab "Genome-wide Map of Nucleosome Acetylation and Methylation in Yeast“. I used perl scripts and database tables to divide each gene into 40 bins, and the surrounding intergenic regions into 20 bins each, then assigned every expression value to the appropriate bin, then averaged all the genes together to get the average expression profile on a gene and the surrounding intergenic regions.

34 Summary Tiling arrays Which ones are available Data analysis Future...
CHip-chip, CGH, expression Which ones are available In-house, Agilent, Affy Data analysis Future... And now for the future...

35 The Future of Tiling Arrays
The resolution continues to increase... Where is the future of tiling arrays leading? Moore’s law (Gorden Moore, co-founder of intel): # of transistors on an intel chip doubles every two years... Here we see the # of features on an affy chip following the same trend. I don’t think our homemade array resolution is increasing quite this fast, but we are continually improving the process and experimenting with new prototypes of pins and printing methods. Agilent is also continually improving their feature density. Future improvement on tiling arrays may allow for completely tiling with overlaps an entire large genome on a single array, making it practical to use tiling arrays as a universal platform for a variety of unbiased genomic studies.

36 Other Future Genomic Technology
454 and Solexa/Illumina “Next Generation” sequencing “Sequence everything in the tube” Shares some things with tiling arrays even more unbiased vast quantities of data analysis methods are being developed While we’re talking about the future of genomics, I have to mention another technology that shares some aspects with tiling arrays, and that’s the 454/Solexa “next gen” sequencing. The basic concept of sequencing everything in the tube. This technology is even more unbiased than tiling arrays, also generates vast quantities of data, and analysis methods are still being developed.

37 Thanks! Microarray Bioinformatics All the labs that use microarrays! Bing Li Workman lab Jennifer Bupp Jasperson lab Norman Pavelka Rong Li Lab 5th floor Building 2 Bioinformatics Microarray N Real quickly before I hand it over to Norman, thanks to my group and my other favorite group, bioinformatics, and all the following people... Please come talk to us up in microarray and bioinformatics, here’s a map showing where we sit. We will help you plan your experiments ahead of time to get the best results in the end. I’d also like to briefly advertise the next Technology and Methods seminar... Allison Karin Me Brian Chris

38 Technology & Methods Seminar
“Adventures in Electron Microscopy” Rhonda Allen Histology Thursday, April 26th, 1:00 p.m. Classroom (1st floor, Administration Building) Schedule with abstracts and previous presentation slides can be found on: K:\Weekly Seminar Schedule\Thursday -- Technology & Methods Information regarding previous seminars can be found at: Norman’s going to talk about his CGH experiments using affymetrix yeast tiling arrays. If you have any questions, I can answer them now or after Norman’s talk.


Download ppt "Technology and Methods Seminar"

Similar presentations


Ads by Google