LS-SNP: Large-scale annotation of coding non- synonymous SNPs based on multiple information sources -Bioinformatics April 2005.

Slides:



Advertisements
Similar presentations
HL7 Clinical Genomics SIG Jan 22, 2004 Usha Reddy, PhD IBM Life Sciences.
Advertisements

The Human Genome Project Main reference: Nature (2001) 409,
Lecture 2 Strachan and Read Chapter 13
SNP Applications statwww.epfl.ch/davison/teaching/Microarrays/snp.ppt.
Codon models R CGT CGC R D GAC GCC A Synonymous substitution Nonsynonymous substitution.
Outline to SNP bioinformatics lecture
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
Dr. Almut Nebel Dept. of Human Genetics University of the Witwatersrand Johannesburg South Africa Significance of SNPs for human disease.
S TRUCTURAL B IOINFORMATICS. A subset of Bioinformatics concerned with the of biological structures - proteins, DNA, RNA, ligands etc. It is the first.
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College BI820 – Seminar in Quantitative and Computational Problems.
Analysis of Phenotypic Variations in the Mouse Genome Caused by Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Bio 465 Summary. Overview Conserved DNA Conserved DNA Drug Targets, TreeSAAP Drug Targets, TreeSAAP Next Generation Sequencing Next Generation Sequencing.
Positional Cloning LOD Sib pairs Chromosome Region Association Study Genetics Genomics Physical Mapping/ Sequencing Candidate Gene Selection/ Polymorphism.
SNP Resources: Finding SNPs, Databases and Data Extraction Debbie Nickerson NIEHS SNPs Workshop.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD Robert J. Livingston, PhD NIEHS Variation Workshop January 30-31, 2005.
Something related to genetics? Dr. Lars Eijssen. Bioinformatics to understand studies in genomics – São Paulo – June Image:
PolyPhen and SIFT: Tools for predicting functional effects of SNPs Epi 244 Spring 2009 Sam S. Oh.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Polymorphisms – SNP, InDel, Transposon BMI/IBGP 730 Victor Jin, Ph.D. (Slides from Dr. Kun Huang) Department of Biomedical Informatics Ohio State University.
SNP Resources: Finding SNPs Databases and Data Extraction Mark J. Rieder, PhD SeattleSNPs Variation Workshop March 20-21, 2006.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Identification of obesity-associated intergenic long noncoding RNAs
Presented by Karen Xu. Introduction Cancer is commonly referred to as the “disease of the genes” Cancer may be favored by genetic predisposition, but.
Department of Biomedical Informatics Bioinformatics and Genetics Kun Huang Department of Biomedical Informatics OSUCCC Biomedical Informatics Shared Resource.
RExPrimer Pongsakorn Wangkumhang, M.Sc. Biostatistics and Informatics Laboratory, Genome Institute, National Center for Genetic Engineering and Biotechnology.
Comments on Rare Variants Analyses Ryo Yamada Kyoto University 2012/08/27 Japan.
The Center for Medical Genomics facilitates cutting-edge research with state-of-the-art genomic technologies for studying gene expression and genetics,
Doug Brutlag 2011 Genomics & Medicine Doug Brutlag Professor Emeritus of Biochemistry &
A single-nucleotide polymorphism tagging set for human drug metabolism and transport Kourosh R Ahmadi, Mike E Weale, Zhengyu Y Xue, Nicole Soranzo, David.
The medical relevance of genome variability Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen,
Pharmacogenetics & Pharmacogenomics Personalized Medicine.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Korea BioInformation Center Byoung-Chul Kim
Sample to Insight Alexander Kaplun, PhD Sep PGMD: a comprehensive pharmacogenomic database for personalized medicine and drug discovery.
Variant Prioritization in Disease Studies. 1. Remove common SNPs Credit: goldenhelix.com.
Polymorphism Haixu Tang School of Informatics. Genome variations underlie phenotypic differences cause inherited diseases.
Julia N. Chapman, Alia Kamal, Archith Ramkumar, Owen L. Astrachan Duke University, Genome Revolution Focus, Department of Computer Science Sources
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Epidemiology 217 Molecular and Genetic Epidemiology Bioinformatics & Proteomics John Witte.
Recombination breakpoints Family Inheritance Me vs. my brother My dad (my Y)Mom’s dad (uncle’s Y) Human ancestry Disease risk Genomics: Regions  mechanisms.
The International Consortium. The International HapMap Project.
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
Polymorphisms traits GWAS 3 million/person 10K-100K people.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Bioinformatics activity Christophe BLANCHET.
Notes: Human Genome (Right side page)
Milanesi Luciano Catania, Italy 13/03/2007 Bioinformatics challenges in European projects in Grid. Milanesi Luciano National Research Council Institute.
Different microarray applications Rita Holdhus Introduction to microarrays September 2010 microarray.no Aim of lecture: To get some basic knowledge about.
Sungkyunkwan University, School of Medicine.
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
What is Haplotyping? T C A G
A multi-strain, high-resolution mouse haplotype map reveals three distinctive genetic signatures Laboratory of Population Genetics.
Interrogation of cross talk between proteins and gene regulatory networks in breast cancer Chambers, Teressa Lee Hiren Karathia Sridhar Hannenhalli.
Bioinformatic Tools for Epigenetic Research
Introduction to bioinformatics lecture 11 SNP by Ms.Shumaila Azam
Predicting Active Site Residue Annotations in the Pfam Database
Large Scale Annotation of Genomic Datasets with Genephony
By Michael Fraczek and Caden Boyer
Searching the NCBI Databases
From Prescription to Transcription: Genome Sequence as Drug Target
Biological Databases BI420 – Introduction to Bioinformatics
Relationship between Genotype and Phenotype
Introduction to Bioinformatic
GersteinLab.org Overview
One SNP at a Time: Moving beyond GWAS in Psoriasis
TF candidate selection pipeline.
The genomic landscape of a HeLa cell line.
Differentially expressed annotated protein-coding and non-coding RNAs in SSA/Ps and traditional HPs identified by RNA-Seq. Differentially expressed annotated.
Presentation transcript:

LS-SNP: Large-scale annotation of coding non- synonymous SNPs based on multiple information sources -Bioinformatics April 2005

Motivation Over 9 million snps in dbsnp with little functional annotation nsSNPs are critical importance for disease and drug sensitivity Prediction of functional snps enables targetting of snps to be genotyped in candidate gene studies help identify causative snp within snps that are in ld

Aims Identify candidate functional SNPs in –Gene –Haplotype –pathway Map nsSNPs onto protein sequences, functional pathways, comparative structure models

Predictions of snp function Predict positions where nsSNPs –rule based: destabilize proteins, interfere with formations of domain-domain interfaces protein-ligand binding –supervised learning (svm): severely affect human health

Methods - pipeline SNP-protein mapping Sequence to structure (exp derived) –genomic seq, protein seq, protein structure SNP prediction annotations combine: –rule based –supervised learning (svm)

SNP Annotations- rule based destabilizing ( Sunyaev, et al., 2001 ) if: –RSA (rel solv access) 0.75 –RSA>50% and diff in accessible surface propensities > 2 –RSA<25% and charge change –variant involves a proline in a helix

rule based (cont.) Interference with domain-domain if: –any of 4 rules combined and –within <=6A of an atom in an adjacent domain effect protein-ligand binding is predicted –any of 4 rules combined and –ligand-binding if <=5A of a HETATM (not covalently bonded to the protein, not one of the 20 aa nor in a water mol)

SNP Annotations- supervised learning (svm) train svm to discriminate between mongenic disease nsSNPs from OMIM and neutral snps from dbSNP ( measure of strain ) ( chemical similarity )

svm – training dataset 1457 disease-associated –VARIANTS in SWISS and OMIM 2504 neutral –neutral VARIANTS according to rules fold cross validation –train on subset 1 and 2 test on 3 –repeated 10 times

svm – training dataset the absolute values gives confidence exclude low confidence predictions –accuracy of 80.5%(+-0.3%) –false pos 19.7%(+-0.2%) –false neg 18.7%(+-0.8%) –122 rejected on low confidence

Results-mapping snp to protein mapping –28,043 (21,255 dbSNP) validated coding nSNPs –70,147 (54,048 dbSNP) incl non validated

Results-structure 13,391(53%) proteins have modelled domains with equivalent residues 13,062 (19%) nsSNPs (all) 8725 (31%) nsSNPs (validated) –67 nsSNPs appear in more than one protein (alt splicing)

Results -function 1886 destablizing nsSNPs (structural rules (1-4)) 1317 monogenic disease-associated nsSNPs by svm –comparative models –conservation –sub properties

Web resource KEGG pathway,snp id(rs),hugo, swissprot filter –SCOP –swissprot –KEGG –UCSC –PDBSUM –MODBASE

genomic seq protein seq

structure

snp prediction annotations

Discussion-data quality validated/non validated snps? –multiple independent submissions –submitter confirmation –alleles observed in at least 2 chr –submision to hapmap report non val and val snps with option to filter

Discussion -ligands local structural env of each snp-ligand cannot be evaluated by the pipeline all contacts reported –some will not be biologically interesting eg snp in proximity of glycerol will have no functional effect but, in glycerolkinase, the snp could be important

Discussion -structural annotations ModSNP 4109 str annotations. 70% sequence identity cutoff LS-SNP 13,062 dbSNP rsIDs (4907 validated) str annotations. No sequence identity cutoff- –instead, score given (0-1) based on seq identity and model assessment (avg identity ~28%)

Discussion -structural annotations ‘…because structure annotations are models, use properties that depend on correct fold assignment and a good target template alignments opposed to atomic-level structural details such as loss of either salt bridges or hydrogen or disulphide bonds.’

Discussion -structural annotations not possible to model effects such as changes in backbone geometry or small side chain alterations

Case study-Glutathione S- Transferase GSTs play key role in cellular detoxification –domain interface –buried charge change –unfavourable change in accessible surface potential at buried postion –conserved in mouse, rat,chicken combination of info sources build convincing case

Caveats only updated twice a year dependant on structure (comparative modelling) –allowing predictions without structure data would have increased numbers no option to add your own snps no idea as to which predictors are best –combinations of predictors domain-domain or ligand binding but no indication of how damaging this might be next version will have hapmap snps svm – monogenic only chose small, subset of Sunyaevs rules - conservation?