Presentation is loading. Please wait.

Presentation is loading. Please wait.

Genome Biology & Applied Bioinformatics Mehmet Tevfik DORAK, MD PhD

Similar presentations


Presentation on theme: "Genome Biology & Applied Bioinformatics Mehmet Tevfik DORAK, MD PhD"— Presentation transcript:

1 Genome Biology & Applied Bioinformatics Mehmet Tevfik DORAK, MD PhD
SNP Sets Mehmet Tevfik DORAK, MD PhD

2 Schedule

3 Outline I. Proxy SNP (ssSNP) sets II. Independent SNP Sets
(HaploReg, VADE) II. Independent SNP Sets RegulomeDB, CADD Variant Set Enrichment Analysis (GRAIL) III. wANNOVAR, AVIA ssSNP = Statistically similar (correlated)

4 Proxy SNP Set Generation
HaploReg generates a statistically similar SNP (ssSNP) set for each SNP together with their functional annotations. The Set Options tab allows setting the desired r2 value (0.80 by default). eQTL results are included. For best outcome, choose the text file output option and examine the results in a spreadsheet.

5 Proxy SNP Set Once an ssSNP set is generated, it can be annotated using the tools listed in next few slides to assess their functionality to choose the most likely causal SNP. The ssSNP list (and HaploReg results) can also be used to make a list of target genes to be used in a gene set enrichment analysis (for combinatorial effects of multiple variants in linkage disequilibrium). Most tools available to generate a list of ssSNPs only include SNPs within 500kb of the lead SSNP. To go beyond that may be necessary, for example, for HLA region SNPs, and this can be achieved by downloading selected SNP genotypes from 1KG (Ferret is helpful) and running the LD analysis on Haploview.

6 Independent SNP Sets Unlike ssSNP sets (which include SNPs correlated with one another), we may want to analyse a set of independent SNPs too. This may be GWAS results, like SNPs that have shown associations. These SNPs can be assessed for causality using a variety of tools. One of them is RegulomeDB ( which provides a functionality score for regulatory effects of non-coding region SNPs.

7 Independent SNP Sets CADD ( combines more than 80 features to provide a single summary score. However, it requires uploading of a list of SNPs in VCF format. Results are provided in a file together with all 80+ features for each SNP (including PolyPhen and SIFT scores). CADD has been superseded by DANN ( and EIGEN ( but these two cannot be reached via web like CADD.

8 Independent SNP Sets To obtain CADD scores of up to 100 thousand variants, prepare a VCF file (explanation is given in this page) and upload your file. Make sure you check the box at the bottom of this screen to get details of all features for all variants (rather than just CADD scores). The results can be downloaded from this site. The scaled CADD scores will be given in the last column of the results file (called PHREG). Link:

9 VSE Analysis for SNPs (GRAIL)
VSE = Variant set enrichment Links: GRAIL: PAPER:

10 VSE Analysis for SNPs (GRAIL)
Figure 1. Gene Relationships Among Implicated Loci (GRAIL) method consists of four steps. (A) Identifying genes in disease regions. For each independent associated SNP or CNV from a GWAS, GRAIL defines a disease region; then GRAIL identifies genes overlapping the region. In this region there are three genes. We use gene 1 (pink arrow) as an example. (B) Assess relatedness to other human genes. GRAIL scores each gene contained in a disease region for relatedness to all other human genes. GRAIL determines gene relatedness by looking at words in gene references; related genes are defined as those whose abstract references use similar words. Here gene 1 has word counts that are highly similar to gene A but not to gene B. All human genes are ranked according to text-based similarity (green bar), and the most similar genes are considered related. (C) Counting regions with similar genes. For each gene in a disease region, GRAIL assesses whether other independent disease regions contain highly significant genes. GRAIL assigns a significance score to the count. In this illustration gene 1 is similar to genes in three of the regions (green arrows), including gene A. (D) Assigning a significance score to a disease region. After all of the genes within a region are scored, GRAIL identifies the most significant gene as the likely candidate. GRAIL corrects its significance score for multiple hypothesis testing, to assign a significance score to the region. Links: GRAIL: PAPER:

11 More Advanced Tools AVIA and wANNOVAR are advanced tools which provide much more detail than other web-based tools. They both use the ANNOVAR database, and provide a web interface to access that database. Prepare the required input files as described (samples provided), submit and get your results. AVIA: wANNOVAR:

12 … Looking forward …..

13 Course material: http://www.dorak.info/genbiol/course.html

14


Download ppt "Genome Biology & Applied Bioinformatics Mehmet Tevfik DORAK, MD PhD"

Similar presentations


Ads by Google