SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.

Slides:



Advertisements
Similar presentations
LS-SNP: Large-scale annotation of coding non- synonymous SNPs based on multiple information sources -Bioinformatics April 2005.
Advertisements

Integrating dbSNP with P. falciparum genome resources.
1 of 25 Sequence Variation in Ensembl. 2 of 25 Outline SNPs SNPs in Ensembl Linkage disequilibrium SNPs in BioMart DAS sources.
Fatchiyah, PhD Dept Biology UB Fatchiyah.lecture.ub.ac.id
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Outline to SNP bioinformatics lecture
Using HapMap.Org A Tutorial Lincoln Stein, Cold Spring Harbor Laboratory.
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Resources at HapMap.Org Tutorial Marcela K. Tello-Ruiz Cold Spring Harbor Laboratory.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Variation Workshop University of Washington March 20-21, 2006 Sponsored by the NHLBI.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Sequence Analysis MUPGRET June workshops. Today What can you do with the sequence? What can you do with the ESTs? The case of SNP and Indel.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson
Data Mining in Ensembl with EnsMart. 2 of 24 All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes.
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
Picking SNPs Application to Association Studies Dana Crawford, PhD SeattleSNPs PGA University of Washington March 20, 2006.
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson NIEHS SNPs Workshop.
How to access genomic information using Ensembl August 2005.
SNP Resources: Variation Discovery, HapMap and the EGP Mark J. Rieder Department of Genome Sciences NIEHS SNPs Workshop Jan 10-11,
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD Robert J. Livingston, PhD NIEHS Variation Workshop January 30-31, 2005.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
PolyPhen and SIFT: Tools for predicting functional effects of SNPs Epi 244 Spring 2009 Sam S. Oh.
Genomewide Association Studies.  1. History –Linkage vs. Association –Power/Sample Size  2. Human Genetic Variation: SNPs  3. Direct vs. Indirect Association.
SNP Selection University of Louisville Center for Genetics and Molecular Medicine January 10, 2008 Dana Crawford, PhD Vanderbilt University Center for.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Selecting TagSNPs in Candidate Genes for Genetic Association Studies Shehnaz K. Hussain, PhD, ScM Assistant Professor Department of Epidemiology, UCLA.
DbSNP: the NCBI database of genetic variation S. T. Sherry, M.H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski and K. Sirotkin, Nucleic Acids.
Computational Molecular Biology Biochem 218 – BioMedical Informatics Simple Nucleotide.
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
Computational research for medical discovery at Boston College Biology Gabor T. Marth Boston College Department of Biology
Tri-I Bioinformatics Workshop: Public data and tool repositories Alex Lash & Maureen Higgins Bioinformatics Core Memorial Sloan-Kettering Cancer Center.
NCBI FieldGuide NCBI Molecular Biology Resources January 2008 Using Entrez.
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Korea BioInformation Center Byoung-Chul Kim
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Introduction to the Gramene Genetic Diversity module 5/2010 Build #31.
Data Mining in Ensembl with BioMart Nov,
SeattleSNPs Variation Discovery Resource Materials prepared by: Mary E. Mangan, PhD Updated: Q Version 1.
1 of 32 Sequence Variation in Ensembl. 2 of 32 Outline SNPs SNPs in Ensembl Haplotypes & Linkage Disequilibrium SNPs in BioMart HapMap project Strain-specific.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Sample to Insight Alexander Kaplun, PhD Sep PGMD: a comprehensive pharmacogenomic database for personalized medicine and drug discovery.
Sackler Medical School
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte, Xin Liu & Mark Pletcher.
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
GVS: Genome Variation Server Materials prepared by: Warren C. Lathe, PhD Updated: Q Version 2.
Bioinformatics and Computational Biology
Copyright OpenHelix. No use or reproduction without express written consent1.
UCSC Genome Browser Zeevik Melamed & Dror Hollander Gil Ast Lab Sackler Medical School.
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Visualization of genomic data Genome browsers. How many have used a genome browser ? UCSC browser ? Ensembl browser ? Others ? survey.
Visualization of genomic data Genome browsers. UCSC browser Ensembl browser Others ? Survey.
Research proposal 2009 信息技术会议 Bioinformatics Analysis & Identification of non-Synonymous SNPs in Candidate Genes for Ascites College of Animal Husbandry.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
1 Bioinformatics Tools for Genotyping Frances Tong Dr. Garry Larson, Ph.D City of Hope Department of Molecular Medicine Southern California Bioinformatics.
Consideration for Planning a Candidate Gene Association Study With TagSNPs Shehnaz K. Hussain, PhD, ScM Epidemiology 243: Molecular.
GEP Annotation Workflow
Visualization of genomic data
Visualization of genomic data
Ensembl Genome Repository.
A Tutorial Lincoln Stein, Cold Spring Harbor Laboratory
Ivan P. Gorlov, Olga Y. Gorlova, Shamil R. Sunyaev, Margaret R
Gene Safari (Biological Databases)
Problems from last section
Presentation transcript:

SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006

Genotype - Phenotype Studies Other questions: How do I know I have *all* the SNPs? What is the validation/quality of the SNPs that are known? Are these SNPs informative in my population/sample? What do I need to know for selecting the “best” SNPs? How do I pick the “best” SNPs? Typical Approach: “I have candidate gene/region and samples ready to study. Tell me what SNPs to genotype.” What information do I need to characterize a SNP for genotyping?

Minimal SNP information for genotyping/characterization What is the SNP? Flanking sequence and alleles. FASTA format >snp_name ACCGAGTAGCCAG [A/G] ACTGGGATAGAAC dbSNP reference SNP # (rs #) Where is the SNP mapped? Exon, promoter, UTR, etc picture of gene with mapped to the gene structure. How was it discovered? Method What assurances do you have that it is real? Validated how? What population – African, European, etc? What is the allele frequency of each SNP? Common (>10%), rare Are other SNPs associated - redundant? Genotyping data!

Finding SNPs: Databases and Extraction How do I find and download SNP data for analysis/genotyping? 1. Entrez Gene - dbSNP - Entrez SNP 2. HapMap Genome Browser 3. SeattleSNPs PGA Candidate gene website 4. Web applications and other tools NIEHS, PolyPhen, ECR Browser

NCBI - Database Resource IL1B

Finding SNPs: Where do I start?

NCBI - Entrez Gene (LocusLink replacement)

Finding SNPs: Entrez Gene

dbSNP Geneview

HapMap Verified Finding SNPs: dbSNP validation (by 2hit-2allele)

Finding SNPs: dbSNP database

Entrez SNP - dbSNP genotype retrieval

Finding SNPs - Gene Genotype Report

Graphic display of genotype data - Visual Genotype

Finding SNPs - Gene Genotype Report

Minimal SNP information for genotyping/characterization What is the SNP? Flanking sequence and alleles. FASTA format >snp_name ACCGAGTAGCCAG [A/G] ACTGGGATAGAAC dbSNP reference SNP # (rs #) Where is the SNP mapped? Exon, promoter, UTR, etc picture of gene with mapped to the gene structure. How was it discovered? Method What assurances do you have that it is real? Validated how? What population – African, European, etc? What is the allele frequency of each SNP? Common (>10%), rare Are other SNPs associated - redundant? Genotyping data! dbSNP - data is there

Entrez Gene Entry - Entrez SNP

Entrez SNP - direct dbSNP querying

Entrez SNP - Parseable Multi-SNP reports

Entrez SNP - Search Limiting Capabilities IL1B

Entrez SNP - Search Limits

Entrez SNP - Search Limiting Capabilities

Entrez SNP - More Limit Searching

Entrez SNP - Query Term Capabilities

Entrez SNP - Search Terms Fields

2[CHR] AND "coding nonsynon"[FUNC] More advanced queries:

Entrez SNP - Search Terms Fields 2[CHR] AND "coding nonsynonymous"[FUNC] AND "PGA-UW-FHCRC"[HANDLE] Note: Can also use wildcard (*) characters, AND, OR, and NOT operators More advanced queries:

Entrez SNP - Advanced Queries

Minimal SNP information for genotyping/characterization What is the SNP? Flanking sequence and alleles. FASTA format >snp_name ACCGAGTAGCCAG [A/G] ACTGGGATAGAAC dbSNP reference SNP # (rs #) Where is the SNP mapped? Exon, promoter, UTR, etc picture of gene with mapped to the gene structure. How was it discovered? Method What assurances do you have that it is real? Validated how? What population – African, European, etc? What is the allele frequency of each SNP? Common (>10%), rare Are other SNPs associated - redundant? Genotyping data! EntrezSNP - better!

Finding SNPs - Entrez SNP Summary 1.dbSNP is useful for investigating detailed information on a small number SNPs - and its good for a picture of the gene 2.Entrez SNP is a direct, fast, database for querying SNP data. 3.Data from Entrez SNP can be retrieved in batches for many SNPs 4.Entrez SNP data can be “limited” to specific subsets of SNPs and formatted in plain text for easy parsing and manipulation 5.More detailed queries can be formed using specific “field tags” for retrieving SNP data

Finding SNPs: Databases and Extraction How do I find and download SNP data for analysis/genotyping? 1. Entrez Gene - dbSNP - Entrez SNP 2. HapMap Genome Browser 3. SeattleSNPs PGA Candidate gene website 4. Web applications and other tools NIEHS, PolyPhen, ECR Browser

Finding SNPs: HapMap Browser

Finding SNPs: HapMap Genotypes

Finding SNPs: HapMap Browser

Minimal SNP information for genotyping/characterization What is the SNP? Flanking sequence and alleles. FASTA format >snp_name ACCGAGTAGCCAG [A/G] ACTGGGATAGAAC dbSNP reference SNP # (rs #) Where is the SNP mapped? Exon, promoter, UTR, etc picture of gene with mapped to the gene structure. How was it discovered? Method What assurances do you have that it is real? Validated how? What population – African, European, etc? What is the allele frequency of each SNP? Common (>10%), rare Are other SNPs associated - redundant? Genotyping data!

Finding SNPs: HapMap Browser 1.HapMap data sets are useful because individual genotype data can be used to determine optimal genotyping strategies (tagSNPs) or perform population genetic analyses (linkage disequilbrium) 2.Data are specific produced by those projects (not all dbSNP) HapMap data is available in dbSNP HapMap data is available in dbSNP 3.HapMap data (Phase II) can be accessed preleased prior to dbSNPs 4.Easier visualization of data and direct access to SNP data, individual genotypes, and LD analysis

Finding SNPs: Databases and Extraction How do I find and download SNP data for analysis/genotyping? 1. Entrez Gene - dbSNP - Entrez SNP 2. HapMap Genome Browser 3. SeattleSNPs PGA Candidate gene website 4. Web applications and other tools NIEHS, PolyPhen, ECR Browser

Finding SNPs: SeattleSNPs Candidate Genes pga.gs.washington.edu

HapMap Compatible

Finding SNPs: SeattleSNPs Candidate Genes

SNP_pos Ind_ID allele1 allele2 Repeat for all individuals Repeat for next SNP

SIFT = Sorting Intolerant From Tolerant Evolutionary comparison of non-synonymous SNPs PolyPhen - Polymorphism Phenotyping Structural protein characteristics and evolutionary comparison

Physical and comparative analyses used to make predictions Uses SwissProt annotations to identify known domains Calculates a substitution probability from BLAST alignments of homologous and orthologous sequences Ranks substitutions on scale of predicted functional effects from “benign” to “probably damaging” PolyPhen: Polymorphism Phenotyping- prediction of functional effect of human nsSNPs

PolyPhen: Polymorphism Phenotyping- prediction of functional effect of human nsSNPs tux.embl-heidelberg.de/ramensky/ tux.embl-heidelberg.de/ramensky/

Finding SNPs: SeattleSNPs Candidate Genes

pga.gs.washington.edu

Finding SNPs: NIEHS SNPs Candidate Genes egp.gs.washington.edu

Aligns sequences to Mouse, Rat, Dog, Opposum, Chicken, Fugu and Drosophila Gene annotations from UCSC Genome Browser Easy retrieval of ECR sequences and alignments Pre-computed transcription factor binding sites ECR Browser: Evolutionary Conserved Regions

Human-mouse alignment Fasta sequences

ECR Browser: Evolutionary Conserved Regions Transcription Factor Binding Sites from Transfac

Finding SNPs: Databases and Extraction Entrez SNP ( Entrez SNP ( Direct access to dbSNP data - versatile and flexible querying HapMap Browser HapMap Browser (hapmap.org) Access to large scale genotype data Rapid/early access on HapMap website Browsers provide visualization and other analysis tools SeattleSNPs SeattleSNPs (pga.gs.washington.edu) Candidate gene focused - inflammation - HLBS phenotypes Comprehensive SNP data from resequencing Early access - prior to dbSNP release Other Resources: NIEHS SNPS (egp.gs.washington.edu), Polyphen, ECR (with TransFac)