Download presentation
Presentation is loading. Please wait.
Published byFelicity Porter Modified over 9 years ago
1
SPAdes: Saint Petersburg Genome Assembler Pavel Pevzner Department of Computer Science and Engineering University of California at San Diego Laboratory for Algorithmic Biology (LAB) Saint Petersburg Academic University Russian Academy of Sciences
2
Step 1. Clone a successful US center in Russia. Step 2. Replicate a leading US educational program in Russia Step 3. Rely on unlimited supply of talent in Russia. Step 1. Focus on areas where Russia has a lead. Step 2. Implement uniquely Russian educational initiatives. Step 3. Rely on a very limited pool of young talented mathematicians/CS, the traditional strength of Russian science. Establishing a World-Class Computational Biology Lab in a Country without a Single World-Class Computer Science or Biology Department SPAdes: Saint Petersburg Genome Assembler Dmitry Anton Alexey Yana Irina Anton Andrey Sergey
3
Step 1. Clone a successful US center in Russia. Step 2. Replicate a leading US educational program in Russia Step 3. Rely on unlimited supply of talent in Russia. Step 1. Focus on areas where Russia has a lead. Step 2. Implement uniquely Russian educational initiatives. Step 3. Rely on a limited pool of young talented quantitative scientists, the traditional strength of Russian science. Establishing a World-Class Computational Biology Lab in a Country without a Single World-Class Computer Science or Biology Department SPAdes: Saint Petersburg Genome Assembler Dmitry Anton Alexey Yana Irina Anton Andrey Sergey –SPAdes was cited ≈200 times, the most cited paper in Russia since 2012
4
Step 1. Clone a successful US center in Russia. Step 2. Replicate a leading US educational program in Russia Step 3. Rely on unlimited supply of talent in Russia. Step 1. Focus on areas where Russia has a lead. Step 2. Implement uniquely Russian educational initiatives. Step 3. Rely on a limited pool of young talented quantitative scientists, the traditional strength of Russian science. Establishing a World-Class Computational Biology Lab in a Country without a Single World-Class Computer Science or Biology Department SPAdes: Saint Petersburg Genome Assembler Dmitry Anton Alexey Yana Irina Anton Andrey Sergey –SPAdes was cited ≈200 times, the most cited paper in Russia since 2012 –SPAdes has been installed in 1000+ labs worldwide –the LAB is now fully funded by industry (Roche, Astra Zeneca, Life Technologies, EMC, IRRI)
5
Mission of the LAB Biomedically transformative approaches that currently occupy a small niche because computational approaches for their solution remain unknown Innovative biomedical technologies that cannot be interpreted without advanced algorithmic analysis Proteogenomics approaches at the intersection of computational proteomics and genomics Bringing algorithmic, genomics, and proteomics expertise to develop new COMPUTATIONAL TECHNOLOGIES for biomedical applications through state-of-the-art software engineering.
6
Transforming the LAB into a world leading bioinformatics group Research and Innovation: Introduction of two disruptive bioinformarmatics technologies (single cell genomics and top-down antibody sequencing) into thousands of biomedical studies. Education: Contributing to interdisciplinary computer science and developing a Bioinformatics Program spanning biology and computer science. Conference hub: RECOMB Algorithmic Biology and RECOMB Bioinformatics Education conferences in 2012 Turning the LAB into an international center of online education. Online bioinformatics education at Massive Online Open Course (MOOC) through Coursera with 32,000 students in Fall 2013 (joint US-Russian project). Addressing critical technology gaps: – Single cell genomics is a rapidly developing technology with important applications in personalized microbiomics and cancer studies. – Therapeutic antibodies is the fastest growing segment of the pharmaceutical industry that is relatively poorly developed in Russia
7
Michael Snyder Reversed his Own Diabetes by Conducting The Most Extensive Medical Diagnostics Ever (Cell, February 2012) 7 6000 proteins and 1000 metabolites are measured every month! (experiment continues today)
8
Sequencing of Microbiome and Individual Tumor Cells 8 Human genome Human microbiome 10 4 human proteins 10 6 bacterial proteins Tumor genome Profiling INDIVIDUAL tumor cells
9
Recent Breakthroughs in Single Cell Genomics Sequencing phased human chromosomes (Yang et al., PNAS 2011) Tracing tumor evolution (Navin et al., Nature 2011) Studying tumor heterogeneity (Dalerba et al., Nature Biotech. 2011) Characterizing single cell transcriptome (Islam et al., Genome Res. 2011) Genome-wide haplotyping (Fan et al., Nature Biotech. 2011) Analyzing uncultivated single cell organisms and revealing the “dark matter of life” (Yoon et al., Science, Yousseff et al., AIM 2011, Chitsaz et al., Nature Biotech, 2011) 9 Tumor single cell sequencing (from Navin et al., Nature 2011)
10
Bacterial Single Cell Genomics Sequencing phased human chromosomes (Yang et al., PNAS 2011) Tracing tumor evolution (Navin et al., Nature 2011) Studying tumor heterogeneity (Dalerba et al., Nature Biotech. 2011) Characterizing single cell transcriptome (Islam et al., Genome Res. 2011) Genome-wide haplotyping (Fan et al., Nature Biotech. 2011) Analyzing uncultivated single cell organisms and revealing the “gray matter of life” (Yoon et al., Science, Yousseff et al., AIM 2011, Chitsaz et al., Nature Biotech, 2011) 10
11
Bacterial Single Cell Genomics in 2014: 1000 Single Cells a Day Sequencing phased human chromosomes (Yang et al., PNAS 2011) Tracing tumor evolution (Navin et al., Nature 2011) Studying tumor heterogeneity (Dalerba et al., Nature Biotech. 2011) Characterizing single cell transcriptome (Islam et al., Genome Res. 2011) Genome-wide haplotyping (Fan et al., Nature Biotech. 2011) Analyzing uncultivated single cell organisms and revealing the “gray matter of life” (Yoon et al., Science, Yousseff et al., AIM 2011, Chitsaz et al., Nature Biotech, 2011) 11 Sequencing of diverse pathogens in hospital environment (McLean et al, Genome Research, 2013) Clinical sequencing of Chlamydia (Seth- Smith et al., Genome Research, 2013) Sequencing “dark matter of life” (McLean et al., PNAS 2013) In 2014, scientists at Oxford used SPAdes to assemble the isolates of Middle East Respiratory Syndrome (MERS) virus, the cause of the ongoing epidemics.
12
Variation in a CRISPR (bacterial immune system) Previous strains of P. gingivalis vs. P. gingivalis from a sink Two previous strains and the new strain have 100% identical repeat sequences. However, the genomes vary in the number of repeat instances, number of spacer sequences, and spacer identity. Spacers represent a history of foreign phages and plasmids, which bacteria use for an immunity-like response. ATCC3327 Region not present
13
13 When Did Single Cell Sequencing Started?
14
Nicolaas de Bruijn July 9, 1918 - February 17, 2012
15
De Bruijn Graphs
16
Nicolaas de Bruijn July 9, 1918 - February 17, 2012
17
De Bruijn Assemblers – Idury and Waterman, JCB 1995 – PP, Tang, Waterman, PNAS 2001 (Euler) – PP, Tang, Tesler, Genome Res, 2004 (A-Bruijn assembly) – Chaisson and PP, Genome Res. 2008 (Euler-SR) – Zerbino and Birney, Genome Res. 2008 (Velvet) – Simpson et al., Genome Res. 2008 (ABySS) – Butler et al. Genome Res. 2008, Gnuerre et al. Genome Res. 2011 (ALLPATHS) – Li et al., Genome Res. 2010 (SOAPdenovo) – and others … – Idury and Waterman, JCB 1995 – PP, Tang, Waterman, PNAS 2001 (Euler) – PP, Tang, Tesler, Genome Res, 2004 (A-Bruijn assembly) – Chaisson and PP, Genome Res. 2008 (Euler-SR) – Zerbino and Birney, Genome Res. 2008 (Velvet) – Simpson et al., Genome Res. 2008 (ABySS) – Butler et al. Genome Res. 2008, Gnuerre et al. Genome Res. 2011 (ALLPATHS) – Li et al., Genome Res. 2010 (SOAPdenovo) – and others … None of them works well with single cell data. No error correction tool works well with single cell data.
18
SPAdes Assembler
19
Bankevich et al., JCB 2012, Pham et al., RECOMB 2012, Vyahhi et al., WABI 2012, Nikolenko et al., BMC Bioinformatics 2012, Nurk et al., JCB 2013, Gurevich et al., Bioinformatics, 2013, McLean et al., Genome Research, 2013, PNAS 2013, Coates et al, PLOS One, 2014, Prjibelsky et al., Bioinformatics, 2014
20
Computer Science Center in Saint Petersburg Founded by the Algorithmic Biology Lab alumni Alexander Kulikov in 2012
21
Bioinformatics Institute in Saint Petersburg Founded by the Algorithmic Biology Lab alumni Nikolay Vyahhi in 2012 Implement Russian-US educational initiative that would benefit students all over the world
22
Online bioinformatics education platform Founded by my students Nikolay Vyahhi (Russia) and Phillip Compeau (US) 20000+ active students from 120+ countries 330,000+ programs submitted by students used in Rosalind Classroms by 100+ professors worldwide chosen as an engine for a Coursera course with 32,000+ students In 2013, Nikolay Vyahhi founded an online education start-up company Stepic. : Joint Russian-US Educational Project
23
Massive Open Online Courses (MOOCs) Revolution Started in 2012 7 million students at Coursera alone
24
Massive Open Online Courses From MOOCs to ???
25
Massive Adaptive Interactive Text Massive Open Online Courses From MOOCs to MAITs
26
Russian-US MAIT Development Team
27
Bioinformatics Algorithms MAIT Given as a MOOC in Fall 2013 and co-taught by Nikolay Vyahhi (the first Russian instructor at Coursera). Featured among top 3 Math&Science MOOCs in the world at CourseTalk.org MoocBook Published in June 2014 3 months later: adopted at 9 universities
29
High-cost Development Team Kolowich 2013: Average time spent developing MOOC is ~100s hours
30
High-cost Development Team Kolowich 2013: Average time spent developing MOOC is ~100s hours 5,000 hours already spent on
31
End of a Classroom As I Know It: My MAIT is a Better Educator than Me! From class syllabus: “Class meeting times will not be lecture time”
32
What Is MAIT? 1.Development team 2.Interactivity and sophisticated assessment software with elements of gamification 3.Adaptiveness
33
High-cost Development Team
34
Kolowich 2013: Average time spent developing MOOC is ~100s hours
35
High-cost Development Team Kolowich 2013: Average time spent developing MOOC is ~100s hours 5,000 hours already spent on
36
“Relative Difficulty of Our MOOC” 12.5 hours work a week on average!
37
“Please Rate Your Overall Satisfaction” One of very few MOOCs with perfect 5-star rating on CourseTalk
38
Quantum Magazine (circulation 200,000)
39
Proposal for Quantum MAIT Introduction to Mathematics MAIT for secondary education Every mathematical concept is presented as a computer game - a student has to win to proceed further Every educational breakdown is addressed through additional modules and game hints. Incorporating elements of Russian educational system as exemplified by the Quantum magazine, Russian math olympics culture, Russian system of math schools for gifted children, etc.
40
Quantum MAIT: Finances and Timeline First product (MATH and CS for Gifted Students): released in 2 years after the start of the company Initial investment: 10 millions dollars 4 principal content developers in the US and Russia will supervise a distributed group of 10-15 content developers for individual mathematical/CS challenges. Many of these content developers will be located in Russia working as consultants thus bypassing the need to establish the affiliate Russian company and capturing rich educational tradition in Russia Each content developer will be paired with a software engineer familiar with web programming and gaming software 5 software engineers (content delivery and web programming) 4 software engineers (gamification) 1 artist
41
Quantum MAIT: Team PP, Bioinformatics Algorithms MAIT experience, knowledge of Russian math education tradition, first job was a high-school teacher Phillip Compeau, Ph.D., project scientist at UCSD, Bioinformatics Algorithms co-author, Rosalind co-founder, extensive tutoring experience in math Alexander Kulikov, Ph.D., Director of Computer Science Center in Saint Petersburg, founder of Rosalind Algorithms, leading Russian MOOC developer, knowledge of Russian math education tradition
42
Quantum MAIT: Team PP, Bioinformatics Algorithms MAIT experience, knowledge of Russian math education tradition, first job was a high-school teacher Phillip Compeau, Ph.D., project scientist at UCSD, Bioinformatics Algorithms co-author, Rosalind co-founder, extensive tutoring experience in math Alexander Kulikov, Ph.D., Director of Computer Science Center in Saint Petersburg, founder of Rosalind Algorithms, leading Russian MOOC developer, knowledge of Russian math education tradition Stanislav Smirnov, Professor at EPFL (Lausanne) and Saint Petersburg University, Fields Medal winner, one of the top mathematicians in the world, first job was a high-school teacher Ruben Abagyan, Professor at UCSD and owner of MolSoft, knowledge of Russian math education tradition
46
Pevzner, Snob, 2011: Russia should establish a distributed Russian National University and fund 200 top Russian scientists at the level of 1 million dollars a year How to Save Russian Science
47
Pevzner, Snob, 2011: Russia should establish a distributed Russian National University and fund 200 top Russian scientists at the level of 1 million dollars a year Three years later: Newly established Russian Science Foundation funds 199 top Russian scientists at the level of 20 million rubles a year. How to Save Russian Science How to Save Russian Education
48
Shadaj (14-year old) “I got really excited about DNA...” Putting Names to Faces
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.