Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright © 2004 Synamatix sdn bhd (538481-U) Applications of a Novel Structured Pattern Database Technology for Analysis of Data from Second Generation.

Similar presentations


Presentation on theme: "Copyright © 2004 Synamatix sdn bhd (538481-U) Applications of a Novel Structured Pattern Database Technology for Analysis of Data from Second Generation."— Presentation transcript:

1 Copyright © 2004 Synamatix sdn bhd (538481-U) Applications of a Novel Structured Pattern Database Technology for Analysis of Data from Second Generation Sequencers

2 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Synamatix Introductions Dr. Arif Anwar – General Manager 14 yrs+ post-Ph.D. California and UK genomics background B.Sc. (hons.) Genetics, U. of London Ph.D. Genetics, UCL, U. of London and U. of Oxford

3 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Life and Death

4 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Genomics 2007-2020 Skilled people Biotechnology Genome centres Drug discoveryPersonalised Drugs Integrated genomics healthcare Foods and livestock Medical Nutraceuticals Cosmeceuticals 2 nd Gen. DNA sequencers Bio-security

5 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Personalised medicine Ultimate aim is predictability Genetic testing now active 80% of healthcare costs are at chronic level Disease progression Cost (Not just $) Predictive DIagnostic Chronic

6 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Personalised medicine Much better and easier to treat “wellness” ….than “sickness” Disease progression with age (years) Reversibility (%) Predictive DIagnostic Chronic 0 100 30405060

7 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Where is the science today?

8 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) DNA sequencing, time and $ crashing.. 2.76MB/run3730x 80MB/runFLX 1 G1000MB/run

9 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Parallel revolution required Cost and speed of DNA sequencing Cost and speed of data analysis Synamatix R & D

10 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Command line interface CORE Database platform SynaRex Bulk SynaProbe Bulk SynaSearch Bulk SynaMer SXoligosearch SXSequenceRefs SXLRESearch SXParse Tool development & data analysis Another 20+ apps www.MGRC.com.my Synamatix solutions built on SynaBASE platform

11 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Synamatix approach for next-gen sequence data 454 reads Illumina reads Sanger reads SOLiD Helicos Others SynaSearch Bulk SXoligosearch SynaMer Another 20+ apps BioinformaticsPresentation Mining Pre-dispositions Diagnostics Therapeutics Nested GUI Mapping and Analysis Viewer CORE Database platform Reference Genomes

12 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Strategy for fast mapping of 454 reads Remaining sequences suspected to be repeats searched using long pattern seeds Using lower stringency parameters, sensitive searches were conducted to find divergent sequences High-speed searching 1 st pass Increased sensitivity searching 2 nd pass Repeats searching 3 rd pass More than 3 billion bp mapped in 6 hrs Approx 200 fold faster than BLAST and MegaBLAST Utilises 1 CPU Run SynaSearch to query against SynaBASE of Human Genome using high stringency settings

13 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Faster than MegaBLAST SynaSearch, SynaSearch with a seed size(mml) of 28, and MegaBLAST performance speed in mapping 20,000 454 reads to the Human Genome (NCBI36).

14 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Higher sensitivity than MegaBLAST Percent Coverage of 20,000 454 reads against the Human Genome (NCBI36) with SynaSearch, SynaSearch with a seed size(mml) of 28, and MegaBLAST.

15 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Existing approach for Illumina reads Accuracy Length 27 Can only handle 2 errors in the read Performs poorly if length is above 30 Insertions and deletions cause algorithm to crash

16 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Synamatix application for Illumina data Uses a weighted profile search Can handle gaps, insertions and deletions No size limit Leverages the Solexa PRB file Accuracy Length 27

17 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Free on-line version

18 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Increased sensitivity

19 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Indels are important

20 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Indels are important DB IndelsSubstitutions Homo Sapiens Human Gene mutation database30%70% Overlapping BACs21%79% Chromosome 2218%82%

21 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Distribution of gaps

22 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) An example of a read missed by ELAND

23 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Using quality scores

24 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Using quality scores

25 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Longer reads give higher specificity

26 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Longer reads give higher specificity

27 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Main benefits of SXOligoSearch Hundreds of times faster on Eukaryote-sized genomes More reads aligned to unique locations Gapped alignments Allows for more mismatches per read Reporting of alignments to repeats improves read density analysis and identification of large deletion polymorphisms No read length limit; most suitable for oligonucleotides < 60bp.

28 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) “Point of Care” Personal Genomes SynaBASE uses a single CPU in a single integrated platform Software solutions start from $483.00 per Gbp of sequence generated No specialised HW or algorithm specific accelerators Savings up to $220,000.00 per year Less consumables Other running costs

29 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Long v Short n-mers Long v Short n-mers advantages and disadvantages 100 mer + ve - ve Fewer false positives Improvement in final assembly Errors in reads may lead to false negatives Slow to process with conventional software

30 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Overlapper for assembly pre-processing Original user data set and requirement was: To find all overlapping exact 100-mers in 50million 1kb sequencing reads – i.e. 50 Billion bp Report n-mers that have a frequency >2 and <m Using conventional software and approaches the user took 500hrs and 1.5TB of disc space to find all 100-mer overlaps Hence standard approach limits usage to 32mers Longer mers help bridge repetitive regions

31 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Longer –mer size leads to better assembly Low-complexity region A shorter overlap results in more false positives A longer overlap results in less false positives Final assembly improved A B

32 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Using SynaMer there is no time increase with longer n-mers

33 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Summary of SynaMer For 30million 1kb reads took 2+3 hours on a dual CPU itanium machine, with temporary file size less than 200GB 100 fold faster than conventional “overlappers” Allows use of longer n-mers Potentially increases quality of assembly

34 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Sanger read mapping Aims: Mapping of whole genome shotgun reads from a mammalian genome to the Human Genome, to facilitate genome assembly using Synamatix and public tools. Compare sensitivity, specificity and performance advantages of Synamatix technologies. Results: In comparison to BLASTz, SynaSearch: Is 219 fold faster Finds 11% more true positives Finds 17% more unique hits to queries Has a higher specificity: 113% fewer false positives fewer multiple placements per read – 2.7 v 5.3 Benefits: Enables significant enhancements in workflow throughput. SynaSearch requires only 1 search process whereas BLASTz requires genome to be separated into 5MB chunks and apportioned across multiple processors. Results in better assemblies of new genomes

35 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) ROI SynaBASE uses a single CPU SynaBASE is a single integrated platform No specialised HW or algorithm specific accelerators Extra coverage equivalent to consumable savings: Illumina – 12% 454 – 17% Sanger – 11%

36 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Summary 2 nd generation sequencing technology leading to costs and throughput of genome sequencing to tumble Synamatix ready TODAY to handle genome assembly and differentiation analysis of all types of reads with: Higher-performance Increased sensitivity More flexibility 454 reads Solexa reads Sanger reads SOLiD Helicos Others

37 Copyright © 2007 Synamatix Sdn. Bhd. (538481-U) Acknowledgements Karim Hercus - MD Colin Hercus – CTO Poh Yang Ming – Bioinformatics Zayed Albertyn – Bioinformatics Ali Reza – Bioinformatics Elaine Mardis Jarret Glasscock Granger Sutton


Download ppt "Copyright © 2004 Synamatix sdn bhd (538481-U) Applications of a Novel Structured Pattern Database Technology for Analysis of Data from Second Generation."

Similar presentations


Ads by Google