Presentation is loading. Please wait.

Presentation is loading. Please wait.

Epigenetics 12/05/07 Statisticians like data.

Similar presentations


Presentation on theme: "Epigenetics 12/05/07 Statisticians like data."— Presentation transcript:

1 Epigenetics 12/05/07 Statisticians like data.
Don’t emphasize method too much, it is not to your advantage. Don’t exaggerate Speak more clearly. In the next slide, explain epigenetics

2 Epigenetic regulation is critical for cell differentiation
Epithelial cell (right); liver cell (left)

3 Gene imprinting

4 More examples of epigenetic regulation

5 Epigenetic mechanisms
DNA methylation Histone modification Nucleosome positions

6 DNA methylation Alberts et al. Molecular Biology of the Cell

7 Methylated genes are silenced

8 Probable mechanisms for DNA methylation induced siliencing
The DNA methylation marker directly interferes with TF binding. The DNA methylation marker is recognized by proteins that cause chromatin structure changes.

9 DNA in the nucleus is complexed with histones to form nucleosomes
10,000 nm DNA in the nucleus is complexed with histones to form nucleosomes 11 nm 30nm Mention linker DNA. Say that its length is variable. Keep it real short! Don’t say everything in the figure. Nucleosome is the fundamental repeating unit in chromatin. 1bp (0.3nm)

10 Histone modification Acetyl Ubiquityl Methyl Phosphoryl
Luger et al. Nature, (1997) Histone tails can be covalently modified in multiple ways at multiple sites Felsenfeld and Groudine, Nature, (2003)

11 How histone modfication is inherited
Histone methylation marks may be inherited by local concentration. The exact mechanism for inheritance is unknown. Even if histone modification is inherited is not proved.

12 Transcriptional regulation by chromatin
Nucleosome positioning Histone modification TF TF TF TF TF target site

13 DNA methylation histone modification chromatin H3K9me3 HP1 H3K9me3
H4K16ac

14 Epigenetic reprogramming during development
Methylation marks are erased during cleavage. Methylation of the maternal genome is actively stripped within hours of fertilization. Maternal genome is passively erased at a slower rate. de novo methylation after implantation. Another round of demethylation during differentiation. DNA methylation is essential for development.

15 Epigenetic reprogramming can reverse tumorgenesis
Figure 1. Two-step cloning procedure to produce mice from cancer cells. Different tumor cells were used as donors for nuclear transfer into enucleated oocytes. Resultant blastocysts were explanted in culture to produce ES cell lines. The tumorigenic and differentiation potential of these ES cells was assayed in vitro by inducing teratomas in SCID mice (1), and in vivo by injecting cells into diploid (2) or tetraploid (3) blastocysts to generate chimeras and entirely ES-cell-derived mice, respectively. Hochelinger et al. Genes & Dev, (2004)

16 Cancer and histone modification
Chin, Nature (1998)

17 Cancer and chromatin BRG1, the motor component of the SWI/SNF chromatin complex, is mutated in multiple cell lines (Wong et al. 2000) prostate DU145; lung A-427; prostate TSU-Pr-1; lung NCI-H1299; breast ALAB; pancreas Hs 700T suggesting BRG1 may be a tumor repressor protein

18 Genomic-view of epigenetic regulation
How to detect genome-wide patterns of epigenetic markers? How do epigenetic factors regulate genome-wide gene expression? How is the distribution of genome-wide epigenetic markers regulated?

19 Log (mononuc/genomic)
1.Tile microarray 20 bp offset, 50-mers Chr III promoters 2.Hybridize mononucleosomal DNA vs naked genomic DNA Green stuff doesn’t have linker DNA Resolution is 20 bp. Nucleosome signals span multiple probes. midlog phase yeast; mononucleosomal DNA is purified by MNase. Don’t say I didn’t do experiments We first filter out promoters containing highly repetitive sequences. Then ~100 promoters are randomly chosen. ~100 promoters correspond to cell-cycle genes Q: How to filter repetitive sequences? A: Highly repetitive sequences are not tiled. 5 or more contiguous probes with perfect matches. 30 contigs. Q: what kind of arrays? A: Pat Brown arrays. Glass. 25,000 probes. 3.Compute Log (mononuc/genomic) Yuan et al., Science, (2005)

20 Nucleosome positioning in yeast
MFA2 HIS3 MATa MATa MATa nucs predicted positioned nucs CHA1 centromere literature positioned nucs Fuzzy nucleosomes are real. Here is how it looks like in our data.. MFA2 (Watson) is the mating pheromone a-factor, made by a cells. HIS3 (Watson) catalyzes the sixth step in histidine biosynthesis; transcription is regulated by Gcn4p. CHA1 (Crick) catalyzes the degradation of both L-serine and L-threonine; required to use serine or threonine as the sole nitrogen source. fuzzy nucs Yuan et al., Science, (2005)

21 Stereotyped pattern Aligned by ATG
Average signal (aligned by ATG codon) shows regular pattern. 95% CI Log2 Ratio Aligned by ATG You might expect that nucleosome positions at different promoters all look differently. But look. Nucleosome positioning has a common pattern, suggesting there may be a basic principle underlying the nucleosome positioning; Show align wrt NFRs Inter-nucleosome distance 160~170 bp. Predict the length of 5’ UTR. Distance to ATG Yuan et al., Science, (2005)

22 Transcription factor binding sites (TFBSs) are likely to be nucleosome-depleted
TFBSs tend to be nucleosome-depleted. Motif sites that are unbound in our condition but bound in other conditions also tend to be nucleosome depleted. Motif sites that are always unbound do not have nucleosome-depletion property. Show one color at a time Highly transcribed genes tend to be more delocalized in ORF. Q: Why does bound (other) also have a strong signal? A: Maybe nucleosome makes accessible the TFBS that are used in other conditions as well. Thus it gives the potential of activity not the activity itself. Yuan et al., Science, (2005)

23 Histone modification in yeast
Liu et al., PLoS Biology, (2005)

24 Co-regulated histone modifications
Liu et al., PLoS Biology, (2005)

25 Nucleosome positioning in human
Ozsolak et al., Nat Biotech, (2007)

26 Histone modification in human
Guenther et al., Cell, (2007)

27 Distinct histone modification pattern in Embryonic Stem (ES) cells
Gene ES ES cells contain both repressive and active markers Differentiated cell type 1 Differentiated cell type 2 Differentiated cells contain either repressive or active markers but not both Differentiated cell type n H3K27M: repressive H3K4M: active Bernstein et al. Cell (2006)

28 Euchromatin and heterochromatin

29 Large–scale chromatin domain
Rinn et al. Cell (2007)

30 Large-scale chromatin domain
ENCODE, Nature, 2007

31 Large-scale chromatin domain
Open Closed ENCODE, Nature, 2007

32 Large-scale chromatin domain
Open Closed ENCODE, Nature, 2007

33 DNA methylation in human
Eckhardt et al. Nat Gen. (2007)

34 DNA-methylation pattern in human
Figure 1 Type and distribution of amplicons. In total, we analyzed 2,524 amplicons from six distinct categories: 43.7% 5¢-UTRs, 22.5% evolutionary conserved regions (ECR), 14.3% intronic regions, 13.3% exonic regions, 3.6% Sp1 transcription factor binding sites and 2.6% ‘other’ Eckhardt et al. Nat Gen. (2007)

35 Histone modification Acetyl Ubiquityl Methyl Phosphoryl
Luger et al. Nature, (1997) Histone tails can be covalently modified in multiple ways at multiple sites Felsenfeld and Groudine, Nature, (2003)

36 Histone code hypothesis
“… multiple histone modifications, acting in a combinatorial or sequential fashion on one or multiple histone tails, specify unique downstream functions …” ― Strahl and Allis, Nature, (2000) Don’t get into long discussion of the code. Simply, different combinations can have different effects. Don’t get into details of Dion’s experiment. Simply, mutagenesis suggests that the code is probably much simpler. H4-lysine acetylation seems to be cumulative. A remarkable hypothesis proposed by Strahl and Allis is that … But this hypothesis also leads to a dilemma, which is, since the number of possible combinations of histone modifications are overwhelming, how can we possibly decode the histone modification? On the other hand, there is plenty of evidence that the “histone code” is not as complicated as conjectured. For example, our group mutated H4 tail lysine to arginine, which mimics unacetylable lysine, in all possible combinations. The overall effect seems to be cumulative rather than combinatorial.

37 Statistical assessment of the global impact of histone acetylation on gene expression
Integrative analysis using multiple genomic data resources (sequence, gene expression, histone modification) Linear regression model yi expression; Aij acetylation; Si promoter sequence Key is to estimate sequence dependent regulatory effects. If the model fits well, then it suggests it is not so complicated. Data come from …, expand on sequence part. Yuan et al. Gen Bio (2006)

38 Estimating sequence dependent regulation effects
Linear regression model with transcription factor binding motifs Sij motif score Scan motif (MDscan, AlignAce) Filter out insignificant motifs (RSIR) linear f(Si) R^2 is about 0.27, reasonably well for this kind of data. Including interaction coefficients, the R^2 is increased by less than 0.01. Repressors have negative coefficients. E.g., RFX1 has negative coefficients. The effect of the motifs are fitted by data. Repressor corresponds to negative weights? Say linear model of sequences. Change S_ij to motif scores. Explain. S_ij looks similar to S_i which is not. Say a few words about Beer-tavazoie’s motifs. Are they better? One RSIR direction is selected. Q: what if there are more than one RSIR direction? Would it still help to include the variables corresponding to both directions? A: Yes. RSIR is only an exploratory tool. Andrew Gelman did an experiment: X^2+y^2=1 to geneerate data. And linear model can fit very well to the data. The fact that there are more than one RSIR direction can be caused by 1) nonlinear effect; or 2) linear effect but inaccurate SIR direction estimate. In the first case, the variables in 2nd SIR are important factors and should be included in the model. On the other hand, it will be difficult to estimate the full nonlinear effect, so we use the simplified linear model as a proxy. In the second case, the variables selected based on the 1st SIR is unreliable. Therefore, using these variables alone may actually ignore some important factors. R-square is about 0.3. Yuan et al. Gen Bio (2006)

39 Performance of the linear regression model

40 Performance of the linear regression model

41 Performance of the linear regression model

42 Cumulative effect of histone acetylation
Test whether including quadratic interaction between different acetylation sites would improve model performance quadratic interaction p-value for quadratic interaction coefficients (gjk) Write out the formula on top Question is does including quadratic interaction terms would improve model performance? Coding region acetylation may not be regulatory but serve as mark. (don’t discuss unless pressed) Data available at three sites statistically insignificant

43 Reading List Strahl and Allis 2000; Bernstein et al. 2007
Proposed histone code hypothesis Bernstein et al. 2007 An up to date review of epigenomics Yuan et al. 2005; Nucleosome positions in yeast Yuan et al. 2006; Statistical analysis of histone related gene expression.


Download ppt "Epigenetics 12/05/07 Statisticians like data."

Similar presentations


Ads by Google