Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zhaohui Steve Qin Department of Biostatistics and Bioinformatics Rollins School of Public Health Emory University 3D Chromosome Organization Statistical.

Similar presentations


Presentation on theme: "Zhaohui Steve Qin Department of Biostatistics and Bioinformatics Rollins School of Public Health Emory University 3D Chromosome Organization Statistical."— Presentation transcript:

1 Zhaohui Steve Qin Department of Biostatistics and Bioinformatics Rollins School of Public Health Emory University 3D Chromosome Organization Statistical challenges and opportunities for analyzing Hi-C data

2 Transcription regulation

3 However … Long-range chromosomal interactions Transcriptional factory Chimeric events

4 Chromosome folding 4 How can a two meter long polymer fit into a nucleus of ten micrometer (10 -5 m) diameter? 0.00001 m

5 Chromosome folding 5 http://en.wikipedia.org/wiki/Chromosome

6 “… deep things in science are not found because they are useful; they are found because it was possible to find them” -- Robert Oppenheimer 6

7 Chromosome Conformation Capture (3C) Dekker et al. Science 2002 Naumova and Dekker J of Cell Science 2010 7 Fine scale: (0-kb)

8 3C-on-chip/Circular 3C (4C) 5C Naumova and Dekker J of Cell Science 2010 8 Fine scale: (0-kb) Intermediate: (0-Mb)

9 Naumova and Dekker J of Cell Science 2010 9 Fine scale: (0-kb) Whole genome Intermediate: (0-Mb)

10 10

11 11

12 12

13 chr1 chr2chr3chr4chr5chr6chr7chr8chr9chr1 0 chr1 1 chr1 2 chr1 3 chr1 4 chr1 5 chr1 6 chr1 7 chr1 8 chr1 9 chr2 0 chr2 1 chr2 2 chr X chrY chr1 224278861138855761770541247153868153615726840949254217663532716450222111 chr2 08603121992923453732532412423542728214022423829392317193712451016 chr3 00621145237255281204186227251206499416018123865244133551931013 chr4 0002771481301891141011271651243883103110144481397841128581 chr5 000062221226317016817626120450911611732236522610550187823 chr6 00000731317207204199256222621271741932817324415059198954 chr7 000000806197216241315232671502062322678328114776227958 chr8 00000004341301642101703586135150155661809443147794 chr9 000000005171632101884376117157196491969136175581 chr1 0 00000000048222819753831441512016622610444173686 chr1 1 00000000008722385813817621725795289174582211188 chr1 2 00000000000607631051341912366021014357160826 chr1 3 000000000000110274744591172201037151 chr1 4 000000000000024285789829113623483460 chr1 5 00000000000000437114181451729346128612 chr1 6 000000000000000538198522279539169829 chr1 7 00000000000000007816124312668184752 chr1 8 0000000000000000013464422366348 chr1 9 00000000000000000082514363207893 chr2 0 000000000000000000038642105645 chr2 1 0000000000000000000014445231 chr2 2 000000000000000000000521744 chr X 00000000000000000000001700 chr Y 000000000000000000000002

14 What are the main findings?

15 In Liberman-Aiden et al. Genomes can be decomposed of compartments A and B, Fractal globule, not equilibrium globule.

16 In Sexton et al. Genome partitioned into physical domains. Domain structure highly connected with epigenetic activities.

17 In Dixon et al. Topological domains. Stable across cell types. Highly conserved across species. Domain boundaries enriched with insulators.

18 In Hou et al. Differences between domain boundary and interior, in terms of gene density, TF and epigenetic factor concentration.

19 Challenges Quality control and pre-processing of the reads, Any bias in the data? and if so, how to normalize? Whether it is possible, and if so, how, to infer the 3-dimesnional chromosomal structure based on the Hi-C data?

20 20 Hi-C Data Preprocess Restriction enzyme cutting site Restriction enzyme cut fragment Self-ligation reads Dangling reads PCR amplification reads Random breaking reads Random break Valid reads Downstream analysis Imakaev et al. 2012

21 Systematic biases in the data 21 Yaffe and Tanay, 2011 Restriction enzyme GC content Mappability

22 Methods for Hi-C Bias Reduction Normalization (equal ‘visibility’, no assumption on biases)  Iterative correction and eigenvector decomposition (ICE) (Imakaev, et al, 2012)  Sequential component normalization (SCN) (Cournac, et al, 2012) Correction (posit a statistical model on biases)  Yaffe & Tanay’s method (Yaffe & Tanay, 2011) Fragment level (4KB, 10 12 ), 420 parameters  HiCNorm (Hu et al, 2012) Any resolution level 1MB, 10 6, 3 parameters 22

23 Motivation and the key assumption 23 Number of paired-end reads spanning the two loci is inversely proportional to the 3D spatial distance between them (obtained from fluorescence in situ hybridization(FISH)). Lieberman-Aiden et al, 2009

24 Bayesian statistical model 24 : number of reads between loci and. : 3D Euclidian distance between loci and. : number of enzyme cut site in locus. : mean GC content in locus. : mean mappability score in locus.

25 Real Hi-C data from Lieberman-Aiden et al. 2009 25 d(L2, L4) = 1.4042, d(L2, L3) = 1.9755, significant

26 mESC: Hind3 vs. Nco1 26

27 Two compartment model

28 Whole Chromosome Model 28 Lieberman-Aiden, et al, 2009 Naumova and Dekker, 2010

29 Other Features (Chromosome 2) 29 CompartmentGene densityGene expressionChromatin accessibility Lamina interaction DNA replication timeH3K36me3H3K27me3 H3K4me3 H3K9me3H3K20me3 RNA polymerase II

30 References Hu M, Deng K, Selvaraj S, Qin ZS, Ren B, Liu JS. (2012) HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics. 28. 3131-3133. http://www.people.fas.harvard.edu/~junliu/HiCNorm/ Hu M, Deng K, Qin ZS, Dixon J, Selvaraj S, Fang J, Ren B, Liu JS. (2012) Bayesian inference of three-dimensional chromosomal organization. PLoS Computational Biology. 9(1):e1002893. http://www.people.fas.harvard.edu/~junliu/BACH/ Hou C, Li L, Qin ZS, Corces, VG. (2012) Gene Density, Transcription and Insulators Contribute to the Partition of the Drosophila Genome into Physical Domains. Mol Cell. 48 471-484 (with preview article of Xu and Felsenfeld (2012) Order from Chaos in the Nucleus. Mol Cell 48. 327-328).. Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y, Hu M, Liu JS and Ren B. (2012) Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485. 376-380.

31 Acknowledgements 31 Ming Hu Ke Deng Jun S. Liu Jesse Dixon Siddarth Selvaraj Bing Ren Li Chunhui Hou Victor Corces


Download ppt "Zhaohui Steve Qin Department of Biostatistics and Bioinformatics Rollins School of Public Health Emory University 3D Chromosome Organization Statistical."

Similar presentations


Ads by Google