Preparing published variants with Mutalyzer webservices Gerard C.P. Schaafsma Department of Human Genetics.

Slides:



Advertisements
Similar presentations
Bioinformatics for genomics Kickoff Bioinformatics Expertise Center 10 November 2009 Judith Boer Dept. of Human Genetics.
Advertisements

The journal as index and incentive for data publication Myles Axton Editor, Nature Genetics Cambridge Oct 23 rd 2011.
Molecular Genetics DNA RNA Protein Phenotype Genome Gene
CS177 Lecture 9 SNPs and Human Genetic Variation Tom Madej
BI420 – Course information Web site: Instructor: Gabor Marth Teaching.
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Mutations Mutation- a change in the DNA nucleotide sequence
MES Genome Informatics I - Lecture VIII. Interpreting variants Sangwoo Kim, Ph.D. Assistant Professor, Severance Biomedical Research Institute,
Christian M Zmasek, PhD Burnham Institute for Medical Research Bioinformatics and Systems Biology
Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.
DNA Mutations. Victims of Chernobyl - Mutations What are mutations? Mutations are a change in the genetic material of a cell (i.e. the genes).
Welcome to DNA Subway Classroom-friendly Bioinformatics.
Organizing information in the post-genomic era The rise of bioinformatics.
Predicting protein degradation rates Karen Page. The central dogma DNA RNA protein Transcription Translation The expression of genetic information stored.
Gene Regulations and Mutations
BDC331 Conservation Genetics 2015 Mr. Adriaan Engelbrecht Department of Biodiversity and Conservation Biology New Life Sciences Building Core 2, Room
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Regulatory Genomics Lab Saurabh Sinha Regulatory Genomics | Saurabh Sinha | PowerPoint by Casey Hanson.
Cool BaRC Web Tools Prat Thiru. BaRC Web Tools We have.
GE3M25: Computer Programming for Biologists Python, Class 5
Workshop practical Helsinki Workshop September 2006.
What do we already know ? The rice disease resistance gene Pi-ta Genetically mapped to chromosome 12 Rybka et al. (1997). It has also been sequenced Bryan.
Section 3: What is a gene?. Genes First let’s compare Chromosomes to a towel.
1 Genetic code: Def. Genetic code is the nucleotide base sequence on DNA ( and subsequently on mRNA by transcription) which will be translated into a sequence.
Current Data And Future Analysis Thomas Wieland, Thomas Schwarzmayr and Tim M Strom Helmholtz Zentrum München Institute of Human Genetics Geneva, 16/04/12.
8.7 Mutations A mutation is a change in an organism’s DNA. This may or may not affect phenotype.
Welcome to the combined BLAST and Genome Browser Tutorial.
The Bovine Genome Database Abstract The Bovine Genome Database (BGD, facilitates the integration of bovine genomic data. BGD is.
Genetics 3.1 Genes. Essential Idea: Every living organism inherits a blueprint for life from its parents.
Identifying disease causal variants Mendelian disorders A. Mesut Erzurumluoglu 1.
Reality Science Fiction! Just silly.. 1. Some mutations affect a single gene, while others affect an entire chromosome. 2. A mutation is a change in an.
DEPARTMENT OF HEALTH AND HUMAN SERVICES National Institutes of Health National Cancer Institute Frederick National Laboratory is a federally funded research.
DNA AND GENETICS Chapter 12 Lesson 3. Essential Questions What is DNA? What is the role of RNA in protein production? How do changes in the sequence of.
SC.912.L.16.3 DNA Replication. – During DNA replication, a double-stranded DNA molecule divides into two single strands. New nucleotides bond to each.
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Interpreting exomes and genomes: a beginner’s guide
Mutations Mutations are alterations in the DNA of chromosomes.
Modern genetics.
Protein Synthesis Molecular Biology
Regulatory Genomics Lab
Results for all features Results for the reduced set of features
Gene Mutations.
Interpretation Next Generation Sequencing (Bench Clinic)
Sequence Alignments—part 2
DNA, Protein Synthesis and Biotechnology EOC Review
2/23/15 Learning Objectives
Mutations Chapter 12-4.
Types of Mutations.
Gene architecture and sequence annotation
Preparing your Data using Python
Preparing your Data using Python
Mutations changes in the DNA sequence that can be inherited
Mutations & Genetic Engineering
Ensembl Genome Repository.
DNA, Protein Synthesis and Biotechnology EOC Review
Bioinformatics Vicki & Joe.
Relationship between Genotype and Phenotype
Figure Revised Niemann-Pick disease type C (NP-C) diagnostic algorithm for the use of biomarkers and genetic testing Revised Niemann-Pick disease type.
DNA and the Genome Key Area 6a & b Mutations.
DNA and the Genome Key Area 6a & b Mutations.
Regulatory Genomics Lab
Standard Mutation Nomenclature in Molecular Diagnostics
A Mutation in the Variable Repeat Region of the Aggrecan Gene (AGC1) Causes a Form of Spondyloepiphyseal Dysplasia Associated with Severe, Premature.
Mutations.
Hunting for Celiac Disease Genes
Part I. Introduction and Genetic Engineering
Changes in mutation rate or protein abundance are not observed in HATs when comparing rho+ to rho0 cells. Changes in mutation rate or protein abundance.
Regulatory Genomics Lab
The Variant Call Format
Presentation transcript:

Preparing published variants with Mutalyzer webservices Gerard C.P. Schaafsma Department of Human Genetics

Tuesday 8 March 2011Work discussion2 Why investigate the possibility of loading published data into LOVD databases ● Pilot project for loading GenomeNL and/or 1000 genomes data ● Loading data from exome-capture project(s) ● Showcase for (editors of) journals ● Load data into empty databases (e.g. those created for mendelian genes)

Tuesday 8 March 2011Work discussion3 Pilot data source Bell, C.J. et al., 2011 “Carrier Testing for Severe Recessive Diseases by Next-Generation Sequencing” Science Translational Medicine 3, 65ra4

Tuesday 8 March 2011Work discussion4 Description of data ● Preconception carrier testing for 448 severe recessive childhood diseases ● Target enrichment and next-generation sequencing of 7717 regions from 437 target genes ● 104 DNA samples ● subset: disease mutations with > 5% incidence and reported in HGMD

Tuesday 8 March 2011Work discussion5 Data from authors: Excel file

Tuesday 8 March 2011Work discussion6 Problems encountered ● Get these data "accessible", i.e. in electronic format, in a database using correct genomic coordinates ● Missing information ● Incorrect information ● Inconsistent notation

Tuesday 8 March 2011Work discussion7 Core information in LOVD: variant data

Tuesday 8 March 2011Work discussion8 What we got from the authors: ● a chromosome number: 13 ● a genome position relative to the human genome build 18: ● the mutant allele: G ● the gene: ATP7B

Tuesday 8 March 2011Work discussion9 What do we need/want also ● the original allele: A ● a reference sequence: NM_ ● a coding DNA position relative to this reference sequence: c.3419T>C (reversed!) ● the (predicted) protein change: p.(Val1140Ala)

Tuesday 8 March 2011Work discussion10 Tools ● Webservices: programmatic access to a remote program (use functionality located elsewhere in a local program/script) ➢ Ensembl Perl API ➢ LOVD RESTful / Atom webservice ➢ Mutalyzer 2.0 SOAP webservices ● All used in Python script, including Database API (DBAPI) to store data in MySQL table

Tuesday 8 March 2011Work discussion11 Which webservice for what? ● Ensembl: original allele ● LOVD: is there a database for a given gene, and if so which reference sequence is used ● Mutalyzer: (longest) transcript ID, HGVS variant description, protein prediction ● Python Database API used to store data in MySQL tables

Tuesday 8 March 2011Work discussion12 Mutalyzer webservices used in script ● For genes without a reference sequence in an LOVD database: - getTranscriptsByGeneName(build, gene): provides 1 or more transcript id's: NM_ ● To choose the longest transcript: - transcriptInfo(LOVD_ver, build, accNo): provides translation start and stop and CDS stop positions: trans_start = -157, trans_stop = 6485, CDS_stop = 4398

Tuesday 8 March 2011Work discussion13 Mutalyzer webservices used in script (cont.) ● To get a converted position (i.e. g. → c. positions) - numberConversion(build, variant) provides a HGVS variant description: c.3419T>C ● To check the HGVS variant description and predict a protein description - runMutalyzer(variant) provides a predicted protein description: p.(Val1140Ala)

Tuesday 8 March 2011Work discussion14 Script outline ● adapt the tab-delimited input file ● insert values in MySQL table ● for each gene, use LOVD webservice for transcript ID - if not found, use Mutalyzer to find transcript IDs - use Mutalyzer to determine longest transcript ID ● write chromosome number, start and end positions to intermediate file

Tuesday 8 March 2011Work discussion15 Script outline (cont.) ● use this file for Perl script to get original alleles from Ensembl ● use Mutalyzer to get HGVS variant descriptions in c. notation (c.3419T>C) ● use Mutalyzer to check these descriptions and to get protein descriptions, p.(Val1140Ala) ● add all acquired info to MySQL table with Python database API

Tuesday 8 March 2011Work discussion16 Data flow

Tuesday 8 March 2011Work discussion17 To do: ● adapt script to make it suitable for variant types other than single nucleotide substitutions ● extend script with a Mutalyzer webservice providing exon/intron numbers ● replace hard-coded variables - column names ● automatically load data into LOVD databases

Tuesday 8 March 2011Work discussion18 Acknowledgements ● Martijn Vermaat ● Jeroen Laros ● Ivo Fokkema ● Peter Taschner ● Johan den Dunnen