Presentation is loading. Please wait.

Presentation is loading. Please wait.

Integrated sequence analysis pipeline provides one-stop solution for identifying disease-causing mutations Cougar Hao Hu, MPIMG.

Similar presentations


Presentation on theme: "Integrated sequence analysis pipeline provides one-stop solution for identifying disease-causing mutations Cougar Hao Hu, MPIMG."— Presentation transcript:

1 Integrated sequence analysis pipeline provides one-stop solution for identifying disease-causing mutations Cougar Hao Hu, MPIMG

2 Medical Re-sequencing Analysis Pipeline
(MERAP)

3 Performance Evaluation
Total read length (Gbp) Q20 read length (%) Aligned read length (%) Uniquely aligned read length (%) RefSeq_Gene RefSeq_ID Chromosome Exon_Start Exon End Exon Coverage Exon Mean Depth HES4 NM_021170 1 934342 934812 87.13 934906 934993 57.56 935072 935167 111.62 935246 935552 26.04 NM_ 934344 87.38 50.36 ISG15 NM_005101 948847 948956 117.93 949364 949919 109.42 AGRN NM_198576 955503 955753 12.09 957581 957842 106.98 970657 970704 45.97 976045 976260 69.2 . RefSeq Gene Coding Length Coding_Coverage Coding Mean Depth Transcript Length Transcript Coverage Transcript Mean Depth BARD1 NM_000465 2334 2594 141.83 BBS7 NM_018190 2019 2617 107.74 BRAT1 NM_152743 2466 80.403 3013 70.3 BSCL2 NM_ 1389 2006 89.21 Coding Size (Mbp) Coding Coverage Transcript Size (Mbp) 62.265 Coding Coverage (Depth_Cutoff) Transcript Coverage (Depth_Cutoff) (3X) (3X) (10X) (10X) (20X) (20X) (30X) (30X) (40X) (40X) (50X) (50X)

4 SNVFinder Error sources: library construction, sequencing, image reading, alignment, neighoring indels Correlation between mappability and error rate Biased positions of indel-induced SNV artifacts

5 IndelFinder

6 CNVFinder

7 Variant spectrum identified

8 Logit score for missense pathogenicity
Involve seven predictors Control samples: missense with allele frequency > 10% in 1000Genome and ESP6500 Case samples: disease causing missense from HGMD

9 Example of MERAP results
Field name Detail Example Note Sample Family and patient ID M999_1274 Family ID M999, patient ID 1274 Variant Gene name, RefSeq ID, Protein length, HGNC ID, Gene description, variant coordinate in terms of genome, cDNA and protein HPD(NM_002150,393aa,HGNC:5147,4-hydroxyphenylpyruvate dioxygenase): g.12: G>C,c.1005C>G,p.I335M Multiple isoforms of RefSeq genes are shown together seperated by '|'. Supporting Reads Number of non-redundant reads supporting the variant 76 Allele Percentage Allele percentage of the variant 0,98 Quality Phred-like quality score (0-40) 40 Affected/Cohort subjects in the cohort, incidence of the variant, homozygote frequency, heterozygote frequency 371|3|1|2 In a cohort of 371 subjects the variant is observed three times with once as homozygote and twice as heterozygote. Linkage Interval Definition of linkage interval, LOD score, length of interval, the variant location in the interval Homozygous|3.1|-----*----- In a homozygous interval with LOD score 3.1 and length 5.5Mb, the variant is located at the center of the interval. '-' stands for length unit 0.5Mb and '*' stands for the location of variant and also a length unit 0.5Mb. HGMD HGMD match of the variant DM;HPD;Tyrosinaemia 3;Hum Genet:v.106,p.654,y.2000 There is a match of the variant in HGMD, with the classification of DM (Disease causing), the host gene name is HPD, the associated phenotype is Tyrosinaemia 3, it is reported in Hum Genet (volumn:106, page:654, year:2000) OMIM OMIM match of the variant TYROSINEMIA,TYPE III dbSNP dbSNP match of the variant rs 1000Genome 1000Genome match of the variant HOM_REF:HOM_VAR:HET=1090:0:2;AF=0.0009;AMR=0.01;ASN=0;AFR=0;EUR=0 The incidences of the variant in homozygous wild type, homozygous variant and heterozygous variant are 1090, 0 and 2, respectively. The allele frequency of the variant in population is , with ethnic-specifc frequencies of 0.01, 0, 0 and 0 in American population, Asian population, African population and European population, respectively. ESP ESP match of the variant N/A Grantham Grantham score for AA change 10 phyloP phyloP score for base conservation 2.547 GERP GERP score for base conservation 3.25 SIFT SIFT prediction and score damaging( ) PolyPhen2 PolyPhen2 prediction and score probably damaging(0.978) MutationTaster MutationTaster prediction and score disease causing( ) CDD CDD match and score cl14632:Glo_EDI_BRP_like superfamily( )| Multiple matches are shown together seperated by '|'. Logit Integrated pathogenicity score by Logit modeling 4.172 Pass the cutoff 3.57 where FDR<0.01 LOF If a loss-of-function tolerant gene N N means negative, P means positive. Interaction Interaction partner of the gene HPD<->IKBKG The gene interacts physically with IKBKG. Disease Known diseases caused by the gene Hawkinsinuria;Tyrosinaemia 3 Inheritance Model Proposed inheritance model for the disease Recessive

10 Variant detection compared with GATK

11

12

13

14 Thanks!


Download ppt "Integrated sequence analysis pipeline provides one-stop solution for identifying disease-causing mutations Cougar Hao Hu, MPIMG."

Similar presentations


Ads by Google