Presentation is loading. Please wait.

Presentation is loading. Please wait.

Localization Analysis

Similar presentations


Presentation on theme: "Localization Analysis"— Presentation transcript:

1 Localization Analysis
11/07/07

2 Tiling arrays Microarray probes are oligonucleotide sequences with regular spacing covering a whole genomic region. chromosome

3 Tiling Arrays

4 Typical applications:
Comparitive Genomic Hybridization (aCGH) – copy number variation RNA analysis: transcript structure, transcript discovery, etc. Location analysis: nuclease sensitivity Location analysis: chromatin immunoprecipitation (ChIP) NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

5 Spike-in experiments – we can find linkers as short as 7 bp
Measured red/green ratio Location of labeled PCR product

6 Experimental Determination of Cross-Hybridization
Spike in PCR product – (1+1)/1 > (1+n)/n, so X-hybing probes will detect less enrichment experimentally

7 Spike-in data

8 Array CGH Technology

9 Genome-wide measurement of DNA copy number alteration by array CGH
Genome-wide measurement of DNA copy number alteration by array CGH. (a) DNA copy number profiles are illustrated for cell lines containing different numbers of X chromosomes, for breast cancer cell lines, and for breast tumors. Each row represents a different cell line or tumor, and each column represents one of 6,691 different mapped human genes present on the microarray, ordered by genome map position from 1pter through Xqter. Moving average (symmetric 5-nearest neighbors) fluorescence ratios (test/reference) are depicted using a log2-based pseudocolor scale (indicated), such that red luminescence reflects fold-amplification, green luminescence reflects fold-deletion, and black indicates no change (gray indicates poorly measured data). (b) Enlarged view of DNA copy number profiles across the X chromosome, shown for cell lines containing different numbers of X chromosomes. Pollack J R et al. PNAS 2002;99: ©2002 by The National Academy of Sciences

10 DNA copy number alteration across chromosome 8 by array CGH
DNA copy number alteration across chromosome 8 by array CGH. (a) DNA copy number profiles are illustrated for cell lines containing different numbers of X chromosomes, for breast cancer cell lines, and for breast tumors. Breast cancer cell lines and tumors are separately ordered by hierarchical clustering to highlight recurrent copy number changes. The 241 genes present on the microarrays and mapping to chromosome 8 are ordered by position along the chromosome. Fluorescence ratios (test/reference) are depicted by a log2 pseudocolor scale (indicated). Selected genes are indicated with color-coded text (red, increased; green, decreased; black, no change; gray, not well measured) to reflect correspondingly altered mRNA levels (observed in the majority of the subset of samples displaying the DNA copy number change). The map positions for genes of interest that are not represented on the microarray are indicated in the row above those genes represented on the array. (b) Graphical display of DNA copy number profile for breast cancer cell line SKBR3. Fluorescence ratios (tumor/normal) are plotted on a log2 scale for chromosome 8 genes, ordered along the chromosome. Pollack J R et al. PNAS 2002;99: ©2002 by The National Academy of Sciences

11 Typical applications:
Comparitive Genomic Hybridization (aCGH) – copy number variation RNA analysis: transcript structure, transcript discovery, etc. Location analysis: nuclease sensitivity Location analysis: chromatin immunoprecipitation (ChIP) NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

12 RNA vs genomic 3’ UTR 5’ UTR

13 Tiling of the Hox loci – mRNA vs. genomic

14

15

16 Transcript maps. ZY Xu et al. Nature 000, 1-5 (2009) doi: /nature07728

17 Typical applications:
Comparitive Genomic Hybridization (aCGH) – copy number variation RNA analysis: transcript structure, transcript discovery, etc. Location analysis: nuclease sensitivity Location analysis: chromatin immunoprecipitation (ChIP) NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

18 DNaseI HS profiling

19 DHS profiling identifies promoters, enhancers, and insulators

20 Isolation of nucleosomal DNA
Cut in half

21

22 Typical applications:
Comparitive Genomic Hybridization (aCGH) – copy number variation RNA analysis: transcript structure, transcript discovery, etc. Location analysis: nuclease sensitivity Location analysis: chromatin immunoprecipitation (ChIP) NOTE: ALL of these things can also be done by deep sequencing, which we will briefly cover towards the end

23 Experimental Protocol
Step 1: crosslink protein with DNA Step 2: sonication (break) DNA Kim and Ren 2007

24 Experimental Protocol
Step 1: crosslink fix protein with DNA Step 2: sonication break DNA Step 3: immuno-precipitation Pull down target protein by specific antibody Kim and Ren 2007

25 Experimental Protocol
Step 1: crosslink fix protein with DNA Step 2: sonication break DNA Step 3: immuno-precipitation Pull down target protein by specific antibody Step 4: hybridization Hybridize input and pulled-down DNA on microarray Kim and Ren 2007

26 Chromatin Immuno-precipitation

27 Tiling Array Data Each TF binding signal is represented by multiple probes. Need more sophisticated statistical tools. Kim and Ren 2007

28 Tiling arrays provide high resolution for identifying bound fragments
Overlapping 25-mer fragments Boyer et al. 2005

29 Mapping histone modifications

30 Chromatin’s primary structure

31 OK, now what? Analysis method strongly depends on how widespread the thing being examined is, and if you have a guess regarding its localization CGH: Just look! TF ChIP-chip, DHS: peak finding algorithms (BUT BUT BUT). RNA, chromatin marks: Hidden Markov Models, aggregation plots

32 CGH Array Segmentation
Key idea: Most probe targets have same copy number as their next neighbors Can average over neighbors Key issue: when is a difference real? Recommended Programs: DNACopy – Solid statistical basis; slow StepGram – Heuristic ; fast

33 Methods Moving average t-test (Keles et al. 2004)
HMM (Li et al. 2005; Yuan et al. 2005) Tilemap (Ji and Wong 2005) MAT (Johnson et al. 2006)

34 Keles’ method Calculate a two-sample t-statistic CHIP-signal Y2 Y1
Input-signal i Keles et al. 2004

35 Keles’ method Calculate a two-sample t-statistic
CHIP-signal Y2 Y1 Moving average scan-statistic Input-signal i

36 Multiple hypothesis testing
Multiple hypothesis testing needs to be considered to control false positive error rates. What is the null distribution of this statistic?

37 Multiple hypothesis testing
Assume has t-distribution Approximate by normal distribution. Alternatively can use resampling method to estimate the null distribution.

38 ChIPOTle: a simple method for identifying ‘bound’ genomic fragments
(Buck et al. 2005) Assumption: real binding site will have distribution of bound fragments encapsulating it. Therefore, true positives will likely have multiple, contiguous fragments with high signal. Walk across tiled genomic probes with user-defined window size Calculate mean signal intensity within each window Estimate p-value of binding (Bonferroni-corrected) based on a standard error model or by permuting the dataset.

39 BUT: Extensive low-affinity transcriptional interactions in the yeast genome Amos Tanay Genome Research 2006

40 OK, what about more continuous data like RNA or chromatin marks?

41 Inferring nucleosomes: HMM

42

43 A Hidden Markov Model objectively identifies nucleosome positions

44 Hidden Markov Models for Identifying Bound Fragments
HMM’s are trained on known data to recognize different states (eg. bound vs. unbound fragments) and the probability of moving between those states Once trained, an HMM can be used to identify the ‘hidden’ states in an unknown dataset, based on the known characteristics of each state (‘emission probabilities ’) and the probability of moving between states (‘transition probabilities’) Example: ChIP-chip data from a tiling microarray identifying regions bound to a transcription complex with a known 50bp binding sequence. You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long. Example: “A hidden Markov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences” Li, Meyer, Liu

45 Example: ChIP-chip data from a tiling microarray identifying regions bound to
a transcription complex with a known 50bp binding sequence. You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long. P= 1.0 P= 0.5 P= 0.3 P= 0 P= 0.5 P= 1.0 P= 0.7 P( I ) = 0.2 P( i ) = 0.8 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2 Unbound 25mer Bound 25mer Bound 25mer Bound 25mer I = Intensity units > 10,000 i = Intensity units < 10,000

46 Emission Probabilities
Example: ChIP-chip data from a tiling microarray identifying regions bound to a transcription complex with a known 50bp binding sequence. You expect that a bound fragment will have high signal on the array and that the bound fragment will be 2-3 probes long. Transition Probabilities Emission Probabilities P= 1.0 P= 0.5 P= 0.3 P= 0 P= 0.5 P= 1.0 P= 0.7 P( I ) = 0.2 P( i ) = 0.8 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2 P( I ) = 0.8 P( i ) = 0.2 Unbound 25mer Bound 25mer Bound 25mer Bound 25mer Given the data, an HMM will consider many different models and give back the optimal model

47 Other types and uses of microarrays: aCGH
CGH (comparative genomic hybridization) looks at cytogenetic abnormalities genomic DNA hybridized to array often uses large clones (e.g., BACs) as array features

48 Validation of data There’s no way that all of your microarray data can be validated. It’s strongly recommended that any key findings be verified by independent means. Northern blots and quantitative RT-PCR are the typical ways of doing this; real-time, quantitative RT-PCR is generally the method of choice.

49 Chromatin’s primary structure

50 One way to turn this 1D trace into 2D is via “averageogram”

51 H4 K16 Acetyl, aligned by NFR

52

53 Beyond Transcription % exchange events (Printed Arrays) % nucleosomes

54

55

56 Multiple visualizations of tiling data

57 RNA-Seq Lockhart and Winzeler 2000 Wang et al. 2009

58 RNA-Seq Whole Transcriptome Shotgun Sequencing
Sequencing cDNA Using NexGen technology Revolutionary Tool for Transcriptomics More precise measurements Ability to do large scale experiments with little starting material

59 RNA-Seq Experiment Wang et al. 2009

60 Mapping Create unique scaffolds
Harder algorithms with such short reads

61 Unbiased sequencing of the yeast transcriptome
Unbiased sequencing of the yeast transcriptome. (A) Distribution of reads mapped to the PAP1 locus. Shown are SGD annotations (downloaded at November 2007) (8), and mapped reads (red, W strand; blue, C strand). Additional tracks plot the cumulative number of reads covering each base position (yellow, YPD; light blue, HS). Full data can be accessed at and is visualized using the University of California, Santa Cruz, genome browser (22). (B) Distribution of reads matched to the genome. Of the 26,050,414 reads sequenced in YPD (Left), 13,424,957 (52%, blue) were uniquely mapped to a single genomic locus, 6,144,595 (23%, green) were mapped to several locations, and 6,480,862 (25%, yellow) could not have been aligned, and were later used to detect splice junctions. Similar numbers were found after a HS (Right). Yassour M et al. PNAS 2009;106: ©2009 by National Academy of Sciences

62 Mapping Place reads onto a known genomic scaffold
Requires known genome and depends on accuracy of the reference

63 Ab initio assembly of a transcript catalog
Ab initio assembly of a transcript catalog. (A) Outline of steps in the catalog construction pipeline. (B) Segmentation of a contiguously transcribed region into 2 regions of distinct expression levels corresponding to the genes YBR287W and APM3. When using YPD reads alone, both genes exhibit similar coverage and thus cannot be segmented. However, in HS, they are differentially expressed, and hence by combining observations from both conditions the automatic segmentation procedure (see Materials and Methods) correctly separates them to 2 units. Tracks from top to bottom: SGD annotations (blue), our catalog (green), read coverage at YPD (yellow), and read coverage at HS (blue). (C) Detection of splice junctions. Full and gapped reads mapped to the RIM1 genomic locus. Tracks are as in B, together with gapped reads (connected segments), our putative splice junctions (in red and blue), including the junction orientations as estimated by donor and acceptor sequence motifs (arrows). As shown, our procedure identifies the exact coordinates and orientation of the known splice site. Yassour M et al. PNAS 2009;106: ©2009 by National Academy of Sciences

64 Biases Wang et al. 2009

65 What the data look like

66 Superimposing channels
Giresi et al, Genome Res. 10

67 Experimental Design for Microarrays
There are a number of important experimental design considerations for a microarray experiment: technical vs biological replicates amplification of RNA dye swaps reference samples

68 Experimental Design for Microarrays
Technical vs biological replicates technical replicates are repeat hybridizations using the same RNA isolate biological replicates use RNA isolated from separate experiments/experimental organisms Although technical replicates can be useful for reducing variation due to hybridization, imaging, etc., biological replicates are necessary for a properly controlled experiment

69 Experimental Design for Microarrays
Amplification of RNA linear amplification methods can be used to increase the amount of RNA so that microarray experiments can be performed using very small numbers of cells. It’s not clear to what degree this affects results, especially with respect to rare transcripts, but seems to be generally OK if done correctly

70 Experimental Design for Microarrays
Dye swaps When using 2-color arrays, it’s important to hybridize replicates using a dye-swap strategy in which the colors (labels) are reversed between the two replicates. This is because there can be biases in hybridization intensity due to which dye is used (even when the sequence is the same). S1 S2 S1 S2

71 Experimental Design for Microarrays
Reference samples one common strategy is to use a reference sample in one channel on each array. This is usually something that will hybridize to most of the features (e.g., a complex RNA mixture). Using a reference sample allows comparisons to be made between different experimental conditions, as each is compared to the common reference. S1 S2 S3 R compare S1/R vs. S2/R vs. S3/R

72 Experimental Design for Microarrays
The bottom line is that you should discuss your experimental design with a statistician before going ahead and beginning your experiments. It’s usually too late and too expensive to change the design once you’ve begun!

73 MIAME (Minimal Information About a Microarray Experiment)
When you publish a microarray experiment, you are expected to make available the following minimal information. This allows others to evaluate your data and compare it to other experimental results: • EXPERIMENT DESIGN type, factors, number of arrays, reference sample, qc, database accession (ArrayExpress, GEO) • SAMPLES USED, PREPARATION AND LABELING • HYBRIDIZATION PROCEDURES AND PARAMETERS • MEASUREMENT DATA AND SPECIFICATIONS quantitations, hardware & software used for scanning and analysis, raw measurements, data selection and transformation procedures, final expression data • ARRAY DESIGN platform type, features and locations, manufacturing protocols or commercial p/n


Download ppt "Localization Analysis"

Similar presentations


Ads by Google