Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome-wide analysis of noncoding regulatory mutations in cancer Nils Weinhold, Anders Jacobsen, Nikolaus Schultz, Chris Sander & William Lee 1Computational.

Similar presentations


Presentation on theme: "Genome-wide analysis of noncoding regulatory mutations in cancer Nils Weinhold, Anders Jacobsen, Nikolaus Schultz, Chris Sander & William Lee 1Computational."— Presentation transcript:

1 Genome-wide analysis of noncoding regulatory mutations in cancer Nils Weinhold, Anders Jacobsen, Nikolaus Schultz, Chris Sander & William Lee 1Computational Biology Program, Memorial Sloan Kettering Cancer Center, New York, New York, USA. 2Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Copenhagen, Denmark. 3Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, New York, USA. June 03, 2015 Ka-Kyung Kim Postdoctoral Researcher Yonsei Biomedical Science Institute Yonsei University College of Medicine

2 Genomic Structure of Gene
activate transcription of a gene or transcription regulate gene transcription regulate post-transcriptionally influence gene expression removed by RNA splicing Consist of coding region of mature RNA transcripts Numbers of Functional Genomic Elements 1.7% Enhancer,promoter: 유전자 발현 조절 TATA box, initiator: tanscriptional initiation에 관여하는 조절인자가 결합 Exon: mature RNA transcripts 코딩 부위를 구성 Intron: splicing에 의해 제거 3’UTR: RNA transcript 안정성에 중요

3 Pathogenic variants in non-coding regions
Makrythanasis P and Antonarakis SE. Clin Genet 2013:84:

4 Non-coding variations
Majority of disease SNPs are in noncoding regions Noncoding variations act through transcription control Nature 473:43-49 (2011) Mapping and analysis of chromatin state dynamics in nine human cell types. Science 337: (2012) Systematic localization of common disease-associated variation in regulatory DNA Cell 152: (2013) Integrative eQTL-based analyses reveal the biology of breast cancer risk loci Cell 155: (2013) Super-enhancers in the control of cell identity and disease

5 Non-coding variations
TERT promoter mutations generate de novo consensus binding motifs for E-twenty-six (ETS) transcription factors, and occur in 50 of 70 (71%) melanomas, 24 cases (16%) in bladder and hepatocellular cancer cells.

6 The obesity-associated noncoding sequences within FTO are functionally connected, at megabase distances, with the homeobox gene IRX3

7 Enhancer : integrated method for predicting enhancer targets

8

9 Nils Weinhold,1,3 Anders Jacobsen,1,3 Nikolaus Schultz,1 Chris Sander,1 and William Lee1,2 variation-dna-reveals-new-mutations-linked

10 Non-coding variation Maturation of sequencing technologies
Computational approach and limitation on non-coding variation of the previous study : Nucleotide conservation, Sample size Comprehensive analysis of somatic mutations from whole-genome sequences (WGS) from 863 cancer patients collected from The Cancer Genome Atlas (TCGA) and other public sources in this study

11 Large-scale Genomics Projects
Cancer genomics projects: The Cancer Genome Atlas (TCGA), the International Cancer Genome Consortium (ICGC) Focus on genomic variation in the coding sequences of tumor genomes Most studies rely heavily on targeted exome sequencing =>understanding of somatic variation in coding regions has improved significantly. The protein-coding component of the genome accounts for less than 2% of the total sequence => very little information on how non-coding variation affects cancer development. Even well-studied cancer types such as non-small-cell lung cancer still have significant sub- populations with no observable “driver” mutation. The Encyclopedia of DNA Elements (ENCODE) project Estimates that roughly 80% of the human genome has some sort of biochemical functionality Somatic mutations in non-coding regions are frequent Disease-associated genomic variation is commonly located in regulatory element Khurana E, et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science. 2013;342: Maurano MT, et al. Systematic localization of common disease-associated variation in regulatory DNA. Science. 2012;337:1190–5 Huang FW, et al. Highly recurrent TERT promoter mutations in human melanoma. Science. 2013;339:957–9. Horn S, et al. TERT promoter mutations in familial and sporadic melanoma. Science. 2013;339:959–61 The GENCODE project High quality reference gene annotation and experimental validation for human and mouse genomes ( Gene/Transcript Biotypes in GENCODE & Ensembl

12 Large-scale Genomics Projects
The FANTOM5 project ( Finds general rules for how cells change from one cell type to another FANTOM5 Data Hub on the UCSC Genome browser GTEx Project ( provide a comprehensive atlas of gene expression and regulation across multiple human tissues. enable studies of expression quantitative trait loci (eQTLs), alternative splicing, and the tissue specificity of gene regulatory mechanisms, and aid in the interpretation of Genome-Wide Association Studies (GWAS) Available at database of Genotype and Phenotype (dbGaP) also

13

14 Methods Calling mutations
Intersection of the somatic mutation calls made by MuTect and Strelka ≥2 mutant alleles for whole-exome sequence data Excluded samples with >500,000 mutations Focused on single-nucleotide substitutions → no consideration for structural variants Defining noncoding regions of interest Promoter regions were defined as the genomic intervals ranging from 2,000 bp upstream to 200 bp downstream of all transcription start sites. 66,944 enhancer regions to gene associations (27,493 unique regions) from a study (Nature 507, 455– 461 (2014)) in which the inferred middle positions of the enhancer regions (±200 bp) Removed regions overlapping ORFs to avoid mutation bias from protein-coding regions (±5 bp) from the collection of regions of interest. Removed the regions corresponding to 429 annotated immunoglobulin loci (± 50 kb) to avoid bias from immune system-coupled somatic hypermutation Identification of hotspot mutations All mutations within 50 bp of each other were merged using BEDTools into hotspot clusters Clusters with 1~2 mutations were removed P value was calculated for each cluster using the negative binomial distribution, taking into account the length of the candidate hotspot, the number of mutations in the cluster and a background mutation rate for the cluster negative binomial distribution: discrete probability distribution of the number of successes in a sequence of independent and identically distributed Bernoulli trials with replacements ("1" as failure, all non-"1"s as successes) URLs CGHub,  Broad Genome Data Analysis Center (GDAC) Firehose, data from Alexandrov et al.15,ftp://ftp.sanger.ac.uk/pub/cancer/AlexandrovEtAl.

15 Methods Testing regions of interest for mutation recurrence
Local approach: extracted 10-kb flanking regions upstream and downstream of the region of interest, excluding ORFs to reduce mutation bias from nearby protein-coding regions. Global approach: nucleotide mutation frequencies from other regions of the same category of regions For each region or gene, maximum FDR for the individual global and local tests. k: number of mutated samples binomial distribution (n, pi) n: the total number of samples with mutation data pi : estimated sample mutation rate for region of interest i under the null hypothesis that the region was not recurrently mutated pi depended on the effective length Li of the region (with ORF overlap subtracted) The estimated nucleotide mutation rate qi for the region under the null hypothesis as follows: Transcription factor analysis Mutations creating ETS transcription factor binding sites if the nucleotide substitution created a novel ETS transcription factor core response element (TGCC>TTCC) Mutations disrupting ETS transcription factor binding sites if they altered an existing ETS core response element (TTCC>TGCC). For each region of interest that contained more than one mutation in an ETS binding site, an empirical P value was computed by comparing the observed count statistic to a reference distribution of count statistics. → Extend more TFBS as well as ETS Expression analysis Expression analysis was performed using RNA sequencing raw counts from TCGA. P values are reported using a negative binomial test from the edgeR package. In-depth analyses of SDHD promoter mutations with a read depth of ≥15 on a set of melanoma samples from TCGA.

16 Results: Assessing the genomic landscape of non-coding mutations
The genome-wide mutation burden varied between different cancer types => consistent with previous observations in exome sequencing studies Figure 1 Summary of data and methods. (a) Tumor samples by disease type. Boldface: TCGA. ALL, acute lymphoblastic leukemia; AML, acute myeloid leukemia; CLL, chronic lymphocytic leukemia. (b) Mean mutation frequency and 95% confidence interval across samples (n = 858) by type of genomic region. CDS, coding sequence. Mutations in transcribed regions, including coding sequences (CDS), introns, and 3′ and 5′ UTRs were observed at similar frequencies (Figure 1b) =>a role for transcription-coupled repair8,17. Intergenic regions, less implicated in gene regulation and possibly under weaker selective constraint, carried the highest mutational burden across all regions investigated here (Mann-Whitney P < 2.2e-16).

17 : small regions that frequently contained mutations
: annotated regions that contained numerous mutations : ETS transcription factor binding sites disrupted or created by mutation (c) Workflow for the identification of recurrent noncoding mutations in regulatory regions of interest. Our approach integrates mutation calls from 863 tumor-normal pairs and regulatory regions of interest, which are tested for noncoding mutations using 3 distinct analyses. Hotspot analysis detects recurrent mutations that are often very focal. Regional recurrence analysis identifies annotated regions of interest that are enriched for mutation throughout the entire region. Transcription factor analysis searches for regions that contain recurrent mutations within transcription factor binding sites.

18 Results: Mutation hotspot
TERT promoter mutations: the catalytic subunit of telomerase, the most significant hotspot (P = ), 2 highly recurrent mutations in muptiple samples across cancer types at chr5: , chr5: having C->T substitutions, as previously reported PLEKHS1 promoter mutations: uncharacterized gene that has not previously been linked to tumorigenesis, pleckstrin homology domain suggesting a role for the protein in intracellular signaling. significant mutations (P = ) at chr10: ,chr10: having C->T transitions, and palindromic to each other Several significant hotspots linked to STAG3, BCL2, TCL1A, AGAP5, TRMT10C, TNK2, WDR74 => many of these genes have been associated with cancer previously Hotspots in the promoter and 5′ UTR of BCL2 are significant as clusters of several mutations within the same sample (average 2.2 mutations per mutated sample) => these are all in B-cell lymphoma samples and are likely a result of targeted somatic hypermutation at hypervariable regions.

19 Figure 2 Hotspot analysis
Figure 2 Hotspot analysis. (a) Significance of mutation hotspots in noncoding regulatory regions. (b) Mutation hotspot in the promoter region of PLEKHS1, including 2 highly recurrent sites (with 11 and 12 mutations) located at the center of a palindromic sequence. 4.6X10-80 1.1X10-127 bar chart: the frequency of the hotspot mutation in individual cancer types gray curve : mutation density across the region

20 Results: Regional Recurrence Analysis
Somatic mutations are frequently distributed across the entire open reading frame Local approach: compares regional mutation rates to the overall mutation frequency in the genomic neighborhood Global approach: compares mutation rates for regions in the same category (promoter or 3′ UTR) and with similar DNA replication timing Together => identified larger, more frequently mutated genomic regions 5′UTR (P < 5.1-8) and promoter of WDR74 (P < 3.6-9) were highly enriched for mutations across numerous positions clustered (Figure 3b) not significantly different in mutated samples. WDR74 contains a WD40 repeat having enzymatic activity and involved in a variety of biological processes, including cell cycle control and apoptosis and mutations in this region are more common than previously known. Other frequently mutated regions in non-coding regions of genes such as SGK1, DHX16, SDHD (Supplementary Tables 9-12). The 5′ end of the SDHD gene, which encodes subunit D of the succinate dehydrogenase complex, contained multiple mutations in putative ETS (E26 transformation-specific) family transcription factor binding sites.

21 Figure 3 Regional recurrence analysis.
(a) Significance of recurrent mutations in regulatory regions of interest. (b) Strong enrichment of mutations in the promoter region of WDR74 in contrast to the remainder of the gene sequence. more often affected by mutation

22 Results: Transcription factor analysis
Mutations in the regulatory regions of TERT, ANKRD53, TAF11, ERLIN2, MEF2C, KRT4, SDHD create novel binding sites for ETS transcription factors (CTCC>TTCC) Promoter mutations in ETS binding site alter regulation of SDHD SDHD mutations can cause paraganglioma, a benign tumor of the head and neck. Recurrent mutations in the TERT promoter create a novel ETS binding site, and mutations in the SDHD promoter damage existing ETS binding sites. Tumors with SDHD promoter mutation significantly reduced expression of the SDHD gene (P = 0.004, Figure 4b). ETS family transcription factors with binding activity in the SDHD promoter: EHF, ELF1, and ETS1 Only ELF1 expression exhibited significant positive correlation with the SDHD expression data in the subset of 42 SDHD proficient samples without promoter mutation (Figure 4c, P < ) Tumor samples with SDHD promoter mutation do not exhibit a correlation between SDHD and ELF1 mRNA levels (P = 0.35) => adverse effect SDHD promoter mutation on transcriptional regulation by ELF1 (Figure 4c). Samples with SDHD mutation had a significantly shorter overall survival compared to a reference group of 88 melanoma samples (P = 0.005, Figure 4d).

23 The box plot displays the first and third quartiles (top and bottom of the boxes), the median (band inside the boxes), and the lowest and highest point within 1.5 times the interquartile range of the lower and higher quartile (whiskers). Figure 4 Transcription factor analysis. Mutations in the promoter region of SDHD disrupt ETS transcription factor binding sites in melanoma cancer genomes. (a) Three recurrently mutated sites in the promoter region of SDHD, each one altering a separate ETS recognition site, which are highly conserved. (b) SDHD mRNA expression is lower in melanoma samples with SDHD promoter mutations (n = 13) in comparison to tumor samples with wild-type (WT) SDHD (n = 42) (negative binomial test). (c) mRNA expression for ELF1 (ETS transcription factor) and SDHD is positively correlated in samples without SDHD promoter mutation (n = 42; blue) and is not correlated in samples with SDHD promoter mutation (n = 13; red). (d) Survival analysis shows that overall survival is significantly lower for samples with SDHD promoter mutation (n = 12) than in the reference group (n = 88).

24 Discussion Cmprehensive analysis of whole-genome sequencing data from 863 individuals with cancer to characterize the landscape of noncoding mutations in cancer. Intergenic regions are more often affected by mutation than transcribed regions in close proximity to the coding sequence, such as introns, promoters, enhancers and UTRs. Distinct types of analysis to identify regions of interest significantly affected by mutation Hotspot analysis focused on small regions that frequently contained mutations Regional recurrence analysis identified annotated regions that contained numerous mutations Transcription factor analysis nominated regions with ETS transcription factor binding sites that were disrupted or created by mutation. Significant findings identified by multiple methods Promoter mutations in the TERT gene were found by all three methods. Hotspot analysis identified highly recurrent mutations in PLEKHS1. The mutations occur at the center of a perfectly palindromic sequence. SDHD promoter mutation was moderately significant in regional recurrence analysis but was subsequently substantiated by transcription factor binding site analysis. Recurrent mutations in three distinct ETS response elements were associated with loss of correlation with ETS transcription factor (ELF1) expression at the mRNA level and with shorter survival times for the affected individuals => Need to apply to all known conserved binding sites Multiple cancer types with fewer than 50 samples => limitation to detecting regions that are mutated at high frequency in individual tumor types or across several different tumor types => similar analyses on larger sets of samples in individual tumor types will provide additional insights Interrogation and interpretation of noncoding mutation will become more accurate and more important as the availability of whole-genome sequencing data increases No consideration for cell- and developmental stage selectiveness / Network(Pathway) / CNV / Non-coding RNA / Methylation

25 Method for annotation and prioritizing noncoding mutation

26

27 non-coding categories
Data S7 lists the functions of the target genes for variants with score > 4 Figure S7 Broad and high-resolution categories. The numbers of sub-categories within each category are shown in brackets.

28 Filtering variants against 1000 Genomes Phase I data
Filtering variants against 1000 Genomes Phase I data. FunSeq filters SNVs against 1000 Genome Phase I database using user-defined MAF (minor allele frequency) threshold. Scoring scheme for non-coding variants. For non-coding SNVs, FunSeq utilizes the results of this paper to score variants. A variant is assigned an additional score of 1 for each of the following categories that are applicable to the variant: 1. ENCODE annotation: Variant is in a region annotated by ENCODE. 2. In sensitive region: Variant is in a sensitive region. 3. In ultrasensitive region: Variant is in an ultrasensitive region. 4. Motif-breaking: Variant breaks a known TF motif 5. Target gene known: Variant is in a gene promoter or the target gene of the enhancer in which it occurs is known 6. Target is hub: The assigned target is a hub.

29

30 Demo

31 Output screen

32 Output screen: results table

33 Output screen: results table


Download ppt "Genome-wide analysis of noncoding regulatory mutations in cancer Nils Weinhold, Anders Jacobsen, Nikolaus Schultz, Chris Sander & William Lee 1Computational."

Similar presentations


Ads by Google