Presentation is loading. Please wait.

Presentation is loading. Please wait.

“Hotspot” algorithm chr5:131,975,056-132,012,092 Idea: gauge enrichment of tags relative to a local background model based on the number of tags in a 50kb.

Similar presentations


Presentation on theme: "“Hotspot” algorithm chr5:131,975,056-132,012,092 Idea: gauge enrichment of tags relative to a local background model based on the number of tags in a 50kb."— Presentation transcript:

1 “Hotspot” algorithm chr5:131,975,056-132,012,092 Idea: gauge enrichment of tags relative to a local background model based on the number of tags in a 50kb surrounding window. Hotspots (height = score)

2 “Hotspot” algorithm Enrichment is measured as a z-score based on the binomial distribution null model. 250 bp 50kb Each tag in the large window is considered an “experiment,” with probability of success (landing in the smaller window) n tags N tags (adjusted for uniquely mapping bases) Given N tags in the large window, expected number of tags in smaller window is

3 “Hotspot” algorithm 250 bp 50kb n tags N tags Given N tags in the large window, expected number of tags in smaller window is The standard deviation for the expected number of tags in the smaller window is And the z-score for the observed number of tags in the smaller window is

4 “Hotspot” algorithm Each tag gets a z-score for the 250bp and 50kb windows centered on it. A hotspot is a succession of tags within a 250bp window, each of whose z-score is greater than 2. The hotspot is scored with the z-score for the 250bp window centered on those tags. hotspot

5 Examples of different kinds of hotspots 1.Monsters 2.Noisy regions

6 Shadowed hotspots Problem: regions of very high enrichment can inflate the background for neighboring regions, deflating z-scores chr1:604,351-609,350 Same as above, rescaled These would be highly significant in isolation, but are missed due to shadowing by the monster.

7 Shadowed hotspots Solution: implement a two-pass hotspot detection scheme. 1.Run first pass of hotspot detection 2.Delete all tags falling in the first-pass hotspots 3.Compute new hotspots with deleted background 4.Combine hotspots from first and second passes, and re-score all using the deleted background: all 50kb windows will only include tags from deleted background. Pass 1 Deleted background Pass 2

8 Hotspots are robust to regions of duplication chr8:129,897,976-130,347,975 chr8:130,151,726-130,201,725 chr8:129,904,851-129,979,850 Called peaks (height = z-score) Disparate peak heights, but comparable z-scores

9 Random Tags As a null model for doing FDR calculations, we generate tags uniformly over the uniquely mappable (for 27-mers) bases of the genome. We use the same number of tags for observed and random data. Observed tags Random tags The random data still coalesce into hotspots. Observed hotspots Random hotspots

10 Properties of Random Tags Still lots of hotspots! 146,752 in random data with same number of tags as observed 395,433 in observed (GM)

11 Properties of Random Tags Enriched in promoters?! (Yes, slightly, since uniquely mappable 27-mers are enriched in promoters.) Distance to Tx start sites Average tag density

12 FDR Calculations Using Random Tags FDR(z-score = T) = # of random peaks with z >=T # of observed peaks with z >=T This is probably conservative, since numerator is likely an overestimate of the number of false positives in the observed data. Observed Random

13 Extending to multiple cell types Call a location multi-cell verified (MCV) if hotspot peaks from different cell types overlap there (after fattening peaks to 300bp). Score these MCV zones with the maximum z-score over the cell type peaks. MCV peaks are then identified by looking at the summed density in the zones. Repeat with multiple random datasets to get random MCV peaks for FDR calc’s. MCV zones Summed density MCV peaks chr5:131,585,550-131,597,894 (GM and BJ)


Download ppt "“Hotspot” algorithm chr5:131,975,056-132,012,092 Idea: gauge enrichment of tags relative to a local background model based on the number of tags in a 50kb."

Similar presentations


Ads by Google