Presentation is loading. Please wait.

Presentation is loading. Please wait.

Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD.

Similar presentations


Presentation on theme: "Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD."— Presentation transcript:

1 Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD.

2 Introduction of Hi-C experiment In-silico Analysis

3 Objectives Develop a general pipeline for Hi-C data processing Detect gene-centric Hi-C interactions across different cell types Differentiate ubiquitous versus tissue specific gene-gene interactions Quantify spatial proximity of genes in pathways and quantify pathway proximity across multiple cell lines Investigate correlation between pathway proximity and pathway activity (approximated by expression of pathway genes) MORE….

4 Summary Outlining the Hind-III fragment distribution of Human Genome (Slide Number – 7 & 8) - These slides display numbers of in-silico Hind-III fragments (recognize AAGCTT) in the human genome. Downstream Hi-C analyses are based on these fragments. Hi-C data processing (Slide Number – 9 & 10) – List of samples processed. The crucial steps are normalization and filtration of the Hi-C interactions. – Filtration: Removal of technical biases from the Hi-C data. These biases include GC%, Ligation Preferences (Self Ligations), unequal tag densities. – Normalization : Normalization is done with background calculation of expected Hi-C reads between two given regions with assumption that interaction probability decreases with increasing distance between the two regions. – Selection of Significant Interactions : Select the significant interactions based on difference between the observed number of reads and the expected number of reads (Odd Ratio) with significance cut-off (P-value : 0.001 & 0.05). Annotate the significant Hi-C interactions (Slide Number: 11) - Annotation of Hi-C interactions with Hg-19 Genomic features (Gene structures, Promoter, Intergenic & Non-coding regions). Non-redundant Genes in Hi-C interactions (Slide number: 12) - Select all annotated genes and promoters involved in a significant Hi-C interaction. The slide show the numbers of genes and promoters in replicates of all tissues. Non-redundant Hi-C Interactions across the tissues and replicates (Slide number: 13) – Hi-C interactions whose end-points are mapped on different genomic features in either replicates of all the tissues.

5 Inter-tissue comparison of Hi-C interactions (Slide number : 14) - Merged all tissue replicate gene-gene Hi-C interactions and searched for interactions that are unique to single tissue and the those that are shared by pair of tissues (Figure-A). Figure-B shows number of gene-gene interactions commonly found in certain number of tissues (Figure-B). KEGG Pathways Analysis (Slide number : 15) – KEGG pathways with fewer than 5 annotated genes were excluded. Edge fraction was used to quantify spatial proximity of the gene in a pathway. Z-score distribution of the KEGG Pathways (Slide number : 16 & 17) – Edge fraction (and their z-score based on 500 length-controlled random gene sets) was calculated for ALL pathways in ALL cell types. Inter-tissue comparison of pathway proximity (Slide number : 18) – Unique and shared pathways with spatial proximity are shown for two z-score thresholds. Heat-map for the Pathways Hi-C analysis (Slide number : 19) – Heatmap shows Z scores of all the pathways in 6 tissues. The Pathways are clustered based on Manhattan distance of the Z-score vector. Summary

6 Finding Hi-C interactions at lower stringency : Since in few tissues read coverage is low, very few significant interactions are detected. We will repeat the analyses with a lower interaction significance cutoff (updated slide number 10) Processing RNA-Seq : There are 4 tissues for which matched RNA-Seq data are available. We will test the hypothesis that spatial proximity of pathways correlate with expression of pathway genes. Future Work

7 Hind-III RE Sites on Annotated Hg19 Genome

8 Distribution of RE sites in cell line sample

9 Sample Fastq files BWA Samtools BAM fileMerged BAM file Samtools Sorted BAM file De-duplicated file Picard tool Separate Hi-C interacting Reads Samtools SAM file Select Significant Interactions HOMER tools Tissue IDTissue SourceDNARNA HEK293Kidney Cell Line (Replicate 1 & 2) hESCEmbryonic Stem Cell Line (Replicate 1 & 2) IMR90Lung Fibroblast Cell Line (Replicate 1 & 2) BT483Mammary Gland Cell Line (Replicate 1 & 2) GM06990B-Lymphocyte Cell Line (Replicate 1 & 2) RWPE1Prostate Epithelial Cell Line (Replicate 1 & 2) Annotate the Interactions Normalize Hi-C reads Hi-C data processing Pathways AnalysisGene centric Analysis In-house Python Scripts HOMER tools

10 Normalization & Filtration of Hi-C interactions N = estimated total number of reads n = estimated total number of interaction reads at each region f = expected frequency of Hi-C reads as a function of distance Select Significant Intra/Inter chromosomal interactions Random Interactions Interactions after Normalization and Filtrations process Annotate the Interactions

11 Annotation of Hi-C interactions on Genomic Structures i.e., HEK293 Tissue

12 Genes & Promoters in Hi-C interactions

13 Genomic features on end points of Hi-C interactions

14 Inter-tissue Hi-C gene-gene interactions Figure – A Figure – B Diagonal values represent interactions unique to a tissue. Other values represent interactions shared between 2 specific tissues

15 Pathways analysis Evaluate Edge-fraction property for its statistical correlation with spatial proximity E(f) = set of observed gene-gene interactions in a pathway Ea(f) = possible gene-gene interactions of all the genes in a pathway Z score of the Edge-Fractions calculated from randomly selected length-controlled genes

16 Pathways analysis for Gene-Gene Interactions 49456 Interactions14524 Interactions30356 Interactions

17 Pathways analysis for Gene-Gene Interactions 20018 Interactions50841 Interactions10088 Interactions

18 Inter-tissue Hi-C pathways interactions Z-score >= 1 Z-score >= 2

19 Heat-map for the Pathways Hi-C analysis


Download ppt "Investigate Variation of Chromatin Interactions in Human Tissues Hiren Karathia, PhD., Sridhar Hannenhalli, PhD., Michelle Girvan, PhD."

Similar presentations


Ads by Google