Presentation is loading. Please wait.

Presentation is loading. Please wait.

LS-SNP: Large-scale annotation of coding non- synonymous SNPs based on multiple information sources -Bioinformatics April 2005.

Similar presentations

Presentation on theme: "LS-SNP: Large-scale annotation of coding non- synonymous SNPs based on multiple information sources -Bioinformatics April 2005."— Presentation transcript:

1 LS-SNP: Large-scale annotation of coding non- synonymous SNPs based on multiple information sources -Bioinformatics April 2005

2 Motivation Over 9 million snps in dbsnp with little functional annotation nsSNPs are critical importance for disease and drug sensitivity Prediction of functional snps enables targetting of snps to be genotyped in candidate gene studies help identify causative snp within snps that are in ld

3 Aims Identify candidate functional SNPs in –Gene –Haplotype –pathway Map nsSNPs onto protein sequences, functional pathways, comparative structure models

4 Predictions of snp function Predict positions where nsSNPs –rule based: destabilize proteins, interfere with formations of domain-domain interfaces protein-ligand binding –supervised learning (svm): severely affect human health

5 Methods - pipeline SNP-protein mapping Sequence to structure (exp derived) –genomic seq, protein seq, protein structure SNP prediction annotations combine: –rule based –supervised learning (svm)

6 SNP Annotations- rule based destabilizing ( Sunyaev, et al., 2001 ) if: –RSA (rel solv access) 0.75 –RSA>50% and diff in accessible surface propensities > 2 –RSA<25% and charge change –variant involves a proline in a helix

7 rule based (cont.) Interference with domain-domain if: –any of 4 rules combined and –within <=6A of an atom in an adjacent domain effect protein-ligand binding is predicted –any of 4 rules combined and –ligand-binding if <=5A of a HETATM (not covalently bonded to the protein, not one of the 20 aa nor in a water mol)

8 SNP Annotations- supervised learning (svm) train svm to discriminate between mongenic disease nsSNPs from OMIM and neutral snps from dbSNP ( measure of strain ) ( chemical similarity )

9 svm – training dataset 1457 disease-associated –VARIANTS in SWISS and OMIM 2504 neutral –neutral VARIANTS according to rules 1-4 3-fold cross validation –train on subset 1 and 2 test on 3 –repeated 10 times

10 svm – training dataset the absolute values gives confidence exclude low confidence predictions –accuracy of 80.5%(+-0.3%) –false pos 19.7%(+-0.2%) –false neg 18.7%(+-0.8%) –122 rejected on low confidence

11 Results-mapping snp to protein mapping –28,043 (21,255 dbSNP) validated coding nSNPs –70,147 (54,048 dbSNP) incl non validated

12 Results-structure 13,391(53%) proteins have modelled domains with equivalent residues 13,062 (19%) nsSNPs (all) 8725 (31%) nsSNPs (validated) –67 nsSNPs appear in more than one protein (alt splicing)

13 Results -function 1886 destablizing nsSNPs (structural rules (1-4)) 1317 monogenic disease-associated nsSNPs by svm –comparative models –conservation –sub properties

14 Web resource KEGG pathway,snp id(rs),hugo, swissprot filter –SCOP –swissprot –KEGG –UCSC –PDBSUM –MODBASE

15 genomic seq protein seq

16 structure

17 snp prediction annotations

18 Discussion-data quality validated/non validated snps? –multiple independent submissions –submitter confirmation –alleles observed in at least 2 chr –submision to hapmap report non val and val snps with option to filter

19 Discussion -ligands local structural env of each snp-ligand cannot be evaluated by the pipeline all contacts reported –some will not be biologically interesting eg snp in proximity of glycerol will have no functional effect but, in glycerolkinase, the snp could be important

20 Discussion -structural annotations ModSNP 4109 str annotations. 70% sequence identity cutoff LS-SNP 13,062 dbSNP rsIDs (4907 validated) str annotations. No sequence identity cutoff- –instead, score given (0-1) based on seq identity and model assessment (avg identity ~28%)

21 Discussion -structural annotations ‘…because structure annotations are models, use properties that depend on correct fold assignment and a good target template alignments opposed to atomic-level structural details such as loss of either salt bridges or hydrogen or disulphide bonds.’

22 Discussion -structural annotations not possible to model effects such as changes in backbone geometry or small side chain alterations

23 Case study-Glutathione S- Transferase GSTs play key role in cellular detoxification –domain interface –buried charge change –unfavourable change in accessible surface potential at buried postion –conserved in mouse, rat,chicken combination of info sources build convincing case

24 Caveats only updated twice a year dependant on structure (comparative modelling) –allowing predictions without structure data would have increased numbers no option to add your own snps no idea as to which predictors are best –combinations of predictors domain-domain or ligand binding but no indication of how damaging this might be next version will have hapmap snps svm – monogenic only chose small, subset of Sunyaevs rules - conservation?

Download ppt "LS-SNP: Large-scale annotation of coding non- synonymous SNPs based on multiple information sources -Bioinformatics April 2005."

Similar presentations

Ads by Google