Presentation is loading. Please wait.

Presentation is loading. Please wait.

Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling.

Similar presentations


Presentation on theme: "Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling."— Presentation transcript:

1 Localization Analysis 11/07/07

2 Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling arrays

3 Tiling Arrays

4 Typical applications: Comparitive Genomic Hybridization (aCGH) – copy number variation RNA analysis: transcript structure, transcript discovery, etc. Location analysis: nuclease sensitivity Location analysis: chromatin immunoprecipitation (ChIP) NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

5 Spike-in experiments – we can find linkers as short as 7 bp Location of labeled PCR product Measured red/green ratio

6 Experimental Determination of Cross-Hybridization Spike in PCR product – (1+1)/1 > (1+n)/n, so X-hybing probes will detect less enrichment experimentally

7 Spike-in data

8 Array CGH Technology

9 Genome-wide measurement of DNA copy number alteration by array CGH Pollack J R et al. PNAS 2002;99: ©2002 by The National Academy of Sciences

10 DNA copy number alteration across chromosome 8 by array CGH Pollack J R et al. PNAS 2002;99: ©2002 by The National Academy of Sciences

11 Typical applications: Comparitive Genomic Hybridization (aCGH) – copy number variation RNA analysis: transcript structure, transcript discovery, etc. Location analysis: nuclease sensitivity Location analysis: chromatin immunoprecipitation (ChIP) NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

12 RNA vs genomic 5’ UTR 3’ UTR

13 Tiling of the Hox loci – mRNA vs. genomic

14

15

16 ZY Xu et al. Nature 000, 1-5 (2009) doi: /nature07728 Transcript maps.

17 Typical applications: Comparitive Genomic Hybridization (aCGH) – copy number variation RNA analysis: transcript structure, transcript discovery, etc. Location analysis: nuclease sensitivity Location analysis: chromatin immunoprecipitation (ChIP) NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

18 DNaseI HS profiling

19 DHS profiling identifies promoters, enhancers, and insulators

20 Isolation of nucleosomal DNA

21

22 Typical applications: Comparitive Genomic Hybridization (aCGH) – copy number variation RNA analysis: transcript structure, transcript discovery, etc. Location analysis: nuclease sensitivity Location analysis: chromatin immunoprecipitation (ChIP) NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

23 Experimental Protocol Step 1: crosslink protein with DNA Step 2: sonication (break) DNA Kim and Ren 2007

24 Experimental Protocol Step 1: crosslink –fix protein with DNA Step 2: sonication –break DNA Step 3: immuno- precipitation –Pull down target protein by specific antibody Kim and Ren 2007

25 Experimental Protocol Step 1: crosslink –fix protein with DNA Step 2: sonication –break DNA Step 3: immuno- precipitation –Pull down target protein by specific antibody Step 4: hybridization –Hybridize input and pulled-down DNA on microarray Kim and Ren 2007

26 Chromatin Immuno-precipitation

27 Tiling Array Data Each TF binding signal is represented by multiple probes. Need more sophisticated statistical tools. Kim and Ren 2007

28 Boyer et al Tiling arrays provide high resolution for identifying bound fragments Overlapping 25-mer fragments

29 Mapping histone modifications

30 Chromatin’s primary structure

31 OK, now what? Analysis method strongly depends on how widespread the thing being examined is, and if you have a guess regarding its localization CGH: Just look! TF ChIP-chip, DHS: peak finding algorithms (BUT BUT BUT). RNA, chromatin marks: Hidden Markov Models, aggregation plots

32 CGH Array Segmentation Key idea: Most probe targets have same copy number as their next neighbors Can average over neighbors Key issue: when is a difference real? Recommended Programs: DNACopy – Solid statistical basis; slow StepGram – Heuristic ; fast

33 Methods Moving average t-test (Keles et al. 2004) HMM (Li et al. 2005; Yuan et al. 2005) Tilemap (Ji and Wong 2005) MAT (Johnson et al. 2006)

34 Keles’ method Calculate a two-sample t- statistic Y2Y2 Y1Y1 i CHIP-signal Input-signal Keles et al. 2004

35 Keles’ method Calculate a two-sample t- statistic Y2Y2 Y1Y1 i CHIP-signal Input-signal w Moving average scan-statistic

36 Multiple hypothesis testing Multiple hypothesis testing needs to be considered to control false positive error rates. What is the null distribution of this statistic?

37 Multiple hypothesis testing Assume has t-distribution Approximate by normal distribution. Alternatively can use resampling method to estimate the null distribution.

38 ChIPOTle: a simple method for identifying ‘bound’ genomic fragments (Buck et al. 2005) Assumption: real binding site will have distribution of bound fragments encapsulating it. Therefore, true positives will likely have multiple, contiguous fragments with high signal. 1.Walk across tiled genomic probes with user-defined window size 2.Calculate mean signal intensity within each window 3.Estimate p-value of binding (Bonferroni-corrected) based on a standard error model or by permuting the dataset.

39 BUT: Extensive low-affinity transcriptional interactions in the yeast genome Amos Tanay Genome Research 2006

40 OK, what about more continuous data like RNA or chromatin marks?

41 Inferring nucleosomes: HMM

42

43 A Hidden Markov Model objectively identifies nucleosome positions

44 Hidden Markov Models for Identifying Bound Fragments HMM’s are trained on known data to recognize different states (eg. bound vs. unbound fragments) and the probability of moving between those states Example: ChIP-chip data from a tiling microarray identifying regions bound to a transcription complex with a known 50bp binding sequence. You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long. Once trained, an HMM can be used to identify the ‘hidden’ states in an unknown dataset, based on the known characteristics of each state (‘emission probabilities ’) and the probability of moving between states (‘transition probabilities’) Example: “A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences” Li, Meyer, Liu

45 Example: ChIP-chip data from a tiling microarray identifying regions bound to a transcription complex with a known 50bp binding sequence. You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long. P( I ) = 0.2 P( i ) = 0.8 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2 I = Intensity units > 10,000i = Intensity units < 10,000 P= 0.5 P= 1.0 P= 0 P= 0.7 P= 0.3 P= 1.0 Unbound 25merBound 25mer

46 Example: ChIP-chip data from a tiling microarray identifying regions bound to a transcription complex with a known 50bp binding sequence. You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long. P= 0.5 P= 1.0 P= 0 P= 0.7 P= 0.3 P= 1.0 Unbound 25merBound 25mer Emission Probabilities Transition Probabilities Given the data, an HMM will consider many different models and give back the optimal model P( I ) = 0.2 P( i ) = 0.8 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2

47 Other types and uses of microarrays: aCGH CGH (comparative genomic hybridization) looks at cytogenetic abnormalities genomic DNA hybridized to array often uses large clones (e.g., BACs) as array features

48 Validation of data There’s no way that all of your microarray data can be validated. It’s strongly recommended that any key findings be verified by independent means. Northern blots and quantitative RT-PCR are the typical ways of doing this; real-time, quantitative RT-PCR is generally the method of choice.

49 Chromatin’s primary structure

50 One way to turn this 1D trace into 2D is via “averageogram”

51 H4 K16 Acetyl, aligned by NFR

52

53 Beyond Transcription % nucleosomes (Printed Arrays) % exchange events (Printed Arrays)

54

55

56 Multiple visualizations of tiling data

57 RNA-Seq Lockhart and Winzeler 2000 Wang et al. 2009

58 RNA-Seq Whole Transcriptome Shotgun Sequencing – Sequencing cDNA – Using NexGen technology Revolutionary Tool for Transcriptomics – More precise measurements – Ability to do large scale experiments with little starting material

59 RNA-Seq Experiment Wang et al. 2009

60 Mapping Create unique scaffolds – Harder algorithms with such short reads

61 Unbiased sequencing of the yeast transcriptome Yassour M et al. PNAS 2009;106: ©2009 by National Academy of Sciences

62 Mapping Place reads onto a known genomic scaffold – Requires known genome and depends on accuracy of the reference

63 Ab initio assembly of a transcript catalog Yassour M et al. PNAS 2009;106: ©2009 by National Academy of Sciences

64 Biases Wang et al. 2009

65 What the data look like

66 Superimposing channels Giresi et al, Genome Res. 10

67 Experimental Design for Microarrays There are a number of important experimental design considerations for a microarray experiment: technical vs biological replicates amplification of RNA dye swaps reference samples

68 Experimental Design for Microarrays Technical vs biological replicates technical replicates are repeat hybridizations using the same RNA isolate biological replicates use RNA isolated from separate experiments/experimental organisms Although technical replicates can be useful for reducing variation due to hybridization, imaging, etc., biological replicates are necessary for a properly controlled experiment

69 Experimental Design for Microarrays Amplification of RNA linear amplification methods can be used to increase the amount of RNA so that microarray experiments can be performed using very small numbers of cells. It’s not clear to what degree this affects results, especially with respect to rare transcripts, but seems to be generally OK if done correctly

70 Experimental Design for Microarrays Dye swaps When using 2-color arrays, it’s important to hybridize replicates using a dye-swap strategy in which the colors (labels) are reversed between the two replicates. This is because there can be biases in hybridization intensity due to which dye is used (even when the sequence is the same). S1S1 S2S2 S1S1 S2S2

71 Experimental Design for Microarrays Reference samples one common strategy is to use a reference sample in one channel on each array. This is usually something that will hybridize to most of the features (e.g., a complex RNA mixture). Using a reference sample allows comparisons to be made between different experimental conditions, as each is compared to the common reference. S1S1 S2S2 S3S3 R R R compare S 1 /R vs. S 2 /R vs. S 3 /R

72 Experimental Design for Microarrays The bottom line is that you should discuss your experimental design with a statistician before going ahead and beginning your experiments. It’s usually too late and too expensive to change the design once you’ve begun!

73 EXPERIMENT DESIGN type, factors, number of arrays, reference sample, qc, database accession (ArrayExpress, GEO) SAMPLES USED, PREPARATION AND LABELING HYBRIDIZATION PROCEDURES AND PARAMETERS MEASUREMENT DATA AND SPECIFICATIONS quantitations, hardware & software used for scanning and analysis, raw measurements, data selection and transformation procedures, final expression data ARRAY DESIGN platform type, features and locations, manufacturing protocols or commercial p/n MIAME (Minimal Information About a Microarray Experiment) When you publish a microarray experiment, you are expected to make available the following minimal information. This allows others to evaluate your data and compare it to other experimental results:


Download ppt "Localization Analysis 11/07/07. Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome Tiling."

Similar presentations


Ads by Google