Presentation is loading. Please wait.

Presentation is loading. Please wait.

CRISPR-Cas systems: exploiting genome context to infer novel functions

Similar presentations


Presentation on theme: "CRISPR-Cas systems: exploiting genome context to infer novel functions"— Presentation transcript:

1 CRISPR-Cas systems: exploiting genome context to infer novel functions
An exercise in system thinking Eugene V Koonin National Center for Biotechnology Information, NIH, Bethesda, MD Symposium on “The Brave New World of Smart Data and Semantics in the Life Sciences” Wageningen University January 24, 2019

2 Evolutionary Genomics Research group at the NCBI

3 The way CRISPR works Makarova et al. Nature Rev Microbiol 2011

4 CRISPR-Cas adaptive immunity system
CRISPR – Clustered Regularly Interspaced Short Palindromic Repeats Cas – CRISPR ASsociated proteins crRNA – CRISPR RNA, the guide RNA Mohanraju et al. Science 2016

5 CRISPR-Cas as a system Generic system properties Specific features
Defined set of core components Extensive, elaborate, specific interactions between components Non-linear causality Clear purpose: antivirus immunity Emergent properties: purpose achieved only through interaction between modules Specific features Modularity Enormous architectural flexibility: multiple ancillary components plugged in Functional flexibility: mechanistic diversification, re-purposing

6 “Current” classification of CRISPR-Cas systems
Multisubunit crRNA-effector complexes (Cascade) Single-subunit crRNA-effector complexes (Cas9-like) Type V introduced Makarova et al., NRMicro 2015

7 Computational pipeline for comprehensive discovery of Class 2 CRISPR-Cas
Shmakov et al, Mol Cell 2015 Shmakov et al. NRMicro 2017

8 New CRISPR-Cas classification: Class 1
Makarova et al, CRISPR J 2018

9 New CRISPR-Cas classification: Class 2
Makarova et al, CRISPR J 2018

10 Discovery of novel CRISPR-Cas systems in genomic and metagenomic databases
At least 10 new subtypes of CRISPR-Cas systems with substantial architectural and functional diversity discovered: from 16 to 26 CRISPR subtypes from 4 to ~14 subtypes of Class 2 – and counting: new opportunities for genome-editing, RNA-targeting and regulatory tools Cas gene list amendment (effector gene renaming): Cas12: TnpB, no HNH nuclease Cpf1=Cas12a, C2c1=Cas12b, C2c3=Cas12c…V-U=??? Cas13: 2xHEPN C2c2=Cas13a, C2c4=Cas13b, C2c5=Cas13c… Shmakov et al, NRMicro 2017

11 Comparison of Class 2 CRISPR-Cas effector nucleases: substantial functional diversity
Nuclease domains tracrRNA PAM/PFS Substrate Target cut Type II/Cas9 TnpB/RuvC+HNH Yes 3’, GC-rich dsDNA Blunt ends Type V-A/Cas12a (Cpf1) TnpB/RuvC+ Nuc No 5’, AT-rich Staggered ends, 5’ overhangs Type V-B/Cas12b (C2c1) TnpB/RuvC Type VI-A/Cas13a (C2c2) Type VI-B/Cas13b+ Csx27/28 2xHEPN 5’,nonG 5’, nonU, 3’NAN,NNA ssRNA U-specific RNA cuts + collateral RNA cleavage Regulated U-specific RNA cuts + collateral RNA cleavage Substantial diversity despite the common overall design Diverse opportunities for genome editing/engineering Type V-A: simplest known system Type VI-A,B: specific RNA cleavage Applications developed for Cas12a,b Cas13a,b Screening of growing genome collections continues Shmakov et al, NRMicro 2017 Smargon et al, Mol Cell 2017

12 Source of diversity: Independent origins of type V
effectors from transposon-encoded TnpB nucleases Shmakov et al, NRMicro 2017

13 Proposed path to maturity
Koonin, Makarova, Phil Trans Roy Soc, in press

14 Systematic exploration of small Cas12 proteins, likely
evolutionary intermediates between TnpB and “mature” Cas12 Yan WX, Hunnewell P, Alfonse LE, Carte JM, Keston-Smith E, Sothiselvam S, Garrity AJ, Chong S, Makarova KS, Koonin EV, Cheng DR, Scott DA. Functionally diverse type V CRISPR-Cas systems. Science Jan 4;363(6422):88-91

15 Diverse activities of type V subtypes: TnpB-Cas12 intermediates
Robust ssRNA cleavage requiring RuvC domain no tracrRNA Collateral ssRNA/DNA cleavage Yan WX, Hunnewell P, Alfonse LE, Carte JM, Keston-Smith E, Sothiselvam S, Garrity AJ, Chong S, Makarova KS, Koonin EV, Cheng DR, Scott DA. Functionally diverse type V CRISPR-Cas systems. Science Jan 4;363(6422):88-91

16 Complementary study of small Cas12
Harrington LB, Burstein D, Chen JS, Paez-Espino D, Ma E, Witte IP, Cofsky JC, Kyrpides NC, Banfield JF, Doudna JA. Programmed DNA destruction by miniature CRISPR-Cas14 enzymes. Science Oct 18 Here we present a set of CRISPR-Cas systems from uncultivated archaea that contain Cas14, a family of exceptionally compact RNA-guided nucleases ( amino acids). Despite their small size, Cas14 proteins are capable of targeted single-stranded DNA (ssDNA) cleavage without restrictive sequence requirements.

17 Exploitation of genome context to discover new CRISPR-associated functions including non-defense ones

18 Subtype VI-D: new opportunities for RNA editing – and a new insight into CRISPR regulation
Arbor Biotechnologies David Cheng, David Scott, Winston Yan Putative accessory proteins: WYL, a regulatory domain, found in contexts similar to or together with CARF Yan WX, Chong S, Zhang H, Makarova KS, Koonin EV, Cheng DR, Scott DA. Cas13d Is a Compact RNA-Targeting Type VI CRISPR Effector Positively Modulated by a WYL-Domain-Containing Accessory Protein. Mol Cell Apr 19;70(2): Konermann S, Lotfy P, Brideau NJ, Oki J, Shokhirev MN, Hsu PD. Transcriptome Engineering with RNA-Targeting Type VI-D CRISPR Effectors. Cell Apr 19;173(3):

19 Cas13d: Sister group of Cas13a
Yan WX, Chong S, Zhang H, Makarova KS, Koonin EV, Cheng DR, Scott DA. Cas13d Is a Compact RNA-Targeting Type VI CRISPR Effector Positively Modulated by a WYL-Domain-Containing Accessory Protein. Mol Cell Apr 19;70(2):

20 Subtype VI-D: phylogeny and locus organization
Figure 1 Cas13d: the smallest RNA-targeting CRISPR effector – by far Subtype VI-D CRISPR-Cas Systems and Diversity of Type VI Subtypes (A) Schematic representation of a maximum likelihood tree topology for a selected subset of Cas13d, with the genomic arrangement of the genes encoding predicted protein components of subtype VI-D system components shown to the right. Each locus sequence is identified by a protein accession or gene number with the species name provided where available. Key proteins and CRISPR arrays are color coded as follows: blue, Cas13d; light orange, WYL-domain-containing protein; light blue, Cas1; green, Cas2; dark gray/gray, CRISPR array. (B) Schematic tree comparing the different type VI subtype locus structures. Gene arrows are shown roughly proportional to size. Labels denote the following: HTH, helix-turn-helix domain; WYL, WYL domain; HEPN, HEPN nuclease domain; TM, transmembrane domains of Csx27–Csx28. Key proteins and CRISPR arrays are color coded as follows: blue, Cas13d; gray, Csx accessory proteins (differentiated by colored domains); light blue, Cas1; green, Cas2; dark gray/gray, CRISPR array. Figure S1 contains the sequence alignment and phylogeny of Cas13d compared to Cas13a, and Figure S2 has the same analysis for the WYL domain proteins. (C) Size comparison for Cas13 proteins from the four type VI subtypes; error bars specify the mean and standard deviation. Yan WX, Chong S, Zhang H, Makarova KS, Koonin EV, Cheng DR, Scott DA. Cas13d Is a Compact RNA-Targeting Type VI CRISPR Effector Positively Modulated by a WYL-Domain-Containing Accessory Protein. Mol Cell Apr 19;70(2): Molecular Cell  , e5DOI: ( /j.molcel ) Copyright © 2018 Elsevier Inc. Terms and Conditions

21 Stimulatory effect of the WYL domain protein on Cas13d
Yan WX, Chong S, Zhang H, Makarova KS, Koonin EV, Cheng DR, Scott DA. Cas13d Is a Compact RNA-Targeting Type VI CRISPR Effector Positively Modulated by a WYL-Domain-Containing Accessory Protein. Mol Cell Apr 19;70(2): WYL domains can also regulate type I systems: Hein S, Scholz I, Voß B, Hess WR. Adaptation and modification of three CRISPR loci in two closely related cyanobacteria. RNA Biol May;10(5):852-64 All six systems are associated with a gene for a possible transcriptional repressor. Indeed, we identified one of these genes, sll7009, as encoding a negative regulator specific for the CRISPR1 subtype I-D system in Synechocystis sp PCC6803

22 Discovering the functions of CRISPR “ancillary” proteins
The Cas10-CARF signaling pathway non-specific RNA degradation/ toxicity? PCD? Niewoehner O, Garcia-Doval C, Rostøl JT, Berk C, Schwede F, Bigler L, Hall J, Marraffini LA, Jinek M. Type III CRISPR-Cas systems produce cyclic oligoadenylate second messengers. Nature Aug 31;548(7669): Kazlauskiene M, Kostiuk G, Venclovas Č, Tamulaitis G, Siksnys V. A cyclic oligonucleotide signaling pathway in type III CRISPR-Cas systems. Science. 2017 Aug 11;357(6351): Koonin EV, Makarova KS. Discovery of Oligonucleotide Signaling Mediated by CRISPR-Associated Polymerases Solves Two Puzzles but Leaves an Enigma. ACS Chem Biol Sep 27

23 Origin of CRISPR effector module from a signal transduction system,
possibly, a suicidal one Koonin, Makarova, Phil Trans Roy Soc in press

24 Discovering the functions of CRISPR “ancillary” proteins
Csn2 ATPase Arslan et al. Double-strand DNA end-binding and sliding of the toroidal CRISPR-associated protein Csn2. Nucleic Acids Res. 2013 Bernheim et al. Inhibition of NHEJ repair by type II-A CRISPR-Cas systems in bacteria. Nat Commun. 2017 Makarova, Koonin, unpublished

25 With the growth of microbial genome database, new CRISPR ancillary proteins can be expected to pop up Can we systematically identify ancillary genes in CRISPR-cas loci, predict protein functions and distinguish functional associations from spurious ones? Shmakov. Makarova, Wolf, Severinov, Koonin, PNAS 2018

26 CRISPRicity: a “guilt by association” approach
360 families Shmakov. Makarova, Wolf, Severinov, Koonin, PNAS 2018

27 Outstanding questions
How do we decide whether a gene is functionally linked to CRISPR-Cas system or just encoded in a “defense island”? How do we rule out “historical” synteny conservation?

28 CorA (50 protein clusters): membrane connection of subtype III-B
CorA, structure CorA, metal ion transporter (MIT) superfamily Tan K, Sather A, Robertson JL, Moy S, Roux B, Joachimiak A. Structure and electrostatic property of cytoplasmic domain of ZntB transporter. Protein Sci. 2009 Oct;18(10): DHH – nanoRNA nuclease, metal-dependent CorA and DHH, model Uemura Y, Nakagawa N, Wakamatsu T, Kim K, Montelione GT, Hunt JF, Kuramitsu S, Masui R. Crystal structure of the ligand-binding form of nanoRNase from Bacteroides fragilis, a member of the DHH/DHHA1 phosphoesterase family of proteins. FEBS Lett Aug 19;587(16): LabA – NYN domain RNA nuclease, metal-dependent Anantharaman V, Aravind L. The NYN domains: novel predicted RNAses with a PIN domain-like fold. RNA Biol Jan-Mar;3(1):18-27. Function? CorA is a phage DNA transporter CorA is a sensor of phage entry through the membrane CorA is a metal transporter and regulates activities of metal-dependent nucleases CorA/DHH is a suicide machine, making membrane leaky and cleaving cellular RNAs if CRISPR immunity fails

29 CorA: horizontal mobility together with III-B effector module
Insertion of unrelated or distantly related type III systems next to cas6 gene No cas1, cas2 and CRISPR arrays Cas6 stays in situ and could be aligned on the nucleotide level Flanking genes mostly stay the same Puigbò P, Makarova KS, Kristensen DM, Wolf YI, Koonin EV. Reconstruction of the evolution of microbial defense systems. BMC Evol Biol. 2017

30 New membrane-associated CARF (29 clusters)
>WP_ hypothetical protein [Thermocrinis ruber] MWKFWEIELKHFKTLLESGKLDDHIEGLYSQFWELPPSHQYELVKYSKEKEVFPSIQTFRKVFKVSEETAVKFFKEKHITFE FPVVSSDGKGELIKAVAIKNLKEVITNLKNIKRHFNPIKEFLKTGFAVFFDREFAGASFQLPTVLNLYVENLPQDALFGAIDKK GNIKSVDGIEEKKKLAKELGLRLVEPYYLSTVDDLKAWFDAESYDVPLYITKTQDRWEGEFKSFLKATGISKEQLTKLEVLS GLETKPIITGQLAGDVWKNVLEEFWRRFKETEQKLHNKERF HIAINGPVALAFAIGVLFGSQKP FVFYHYQNNIYHPITVENVRELKERKESLEKIEQHFQKGGKSLVVMLSFAHHEMESDVKNYISRKVENPSYLLLRAKSS GNIAVEDMKEVATETASVIQNIRREHSFEDF HFFLSTPVPIAFMGGLSFGHY GEGYIYNYAGGTYEPVVSFSFLKALREGKYVLSEV Some have only membrane-associated CARF domain CARF – OligoA-binding domain, CRISPR associated Rossmann fold Membrane CARF, model Makarova KS, Anantharaman V, Grishin NV, Koonin EV, Aravind L. CARF and WYL domains: ligand-binding regulators of prokaryotic defense systems. Front Genet Apr 30;5:102 Niewoehner O, Garcia-Doval C, Rostøl JT, Berk C, Schwede F, Bigler L, Hall J, Marraffini LA, Jinek M. Type III CRISPR-Cas systems produce cyclic oligoadenylate second messengers. Nature Aug 31;548(7669): Kazlauskiene M, Kostiuk G, Venclovas Č, Tamulaitis G, Siksnys V. A cyclic oligonucleotide signaling pathway in type III CRISPR-Cas systems. Science Aug 11;357(6351): Function? ? Membrane-associated stress signaling?

31 Membrane protein specific for III-B in Actinobacteria (3 clusters)
No HEPN, CARF alone

32 Complex organization of type III loci: challenge for further study of CRISPR biology beyond adaptive immunity

33 Type IV-B and CysH-like proteins (10 clusters)
Actinomyces Aureimonas Cellulosimicrobium Gordonia Haloterrigena Leptospira Mycobacterium Rhodococcus Rubinisphaera Ruminococcus Spirochaeta Tetrasphaera Thermoanaerobacter Xylanimonas CysH belongs to Adenine nucleotide alpha hydrolases superfamily (ATP sulphurylases, tRNA methyl transferase, Universal Stress Response protein) Presence/absence of cysH in type IV loci (+/- 20 genes from type IV genes) VIP2 - ART superfamily (NAD:arginine ADP-ribosyltransferase –like)

34 STAND NTPase linked to interference-deficient I-E system
STAND: Signal Transduction ATPases with Numerous Domains Leipe DD, Koonin EV, Aravind L. STAND, a class of P-loop NTPases including animal and plant regulators of programmed cell death: multiple, complex domain architectures, unusual phyletic patterns, and evolution by horizontal gene transfer. J Mol Biol. 2004

35 Phylogenetic analysis of Cas5 family from I-E systems: Cas1-less loci

36 Branch A gene neighborhood analysis: convergent loss of Cas3/interference
Degradation and loss of Cas3/ interference

37 Distinct STAND NTPase associated with Cas1-less I-E systems in actinobacteria. STAND-like NTPases are known to be involved in apoptosis in eukaryotes, but the bacterial homologs remain virtually uncharacterized. These systems lack Cas3 and accordingly, the interference capacity. Leipe DD, Koonin EV, Aravind L. STAND, a class of P-loop NTPases including animal and plant regulators of programmed cell death: multiple, complex domain architectures, unusual phyletic patterns, and evolution by horizontal gene transfer. J Mol Biol. 2004 The Cas5 phylogeny shows that the branch associated with the STAND NTPase includes primarily Cas1-less loci (Branch A). In addition, in this subtree, there are at least 3 large clades of Cas1-less I-E systems (branches B,C,D). All sequences in these branches are from actinobacteria. Branch A includes a small subtree where Cas1-less systems are linked to another STAND-like NTPase, which is only distantly related to the first one. The second STAND-like NTPase is fused to a caspase, a protease known in to be involved in apoptosis in eukaryotes. Several complete genomes that encode these Cas1-less system lack other CRISPR-Cas systems Defense without interference, via PCD? Non-defense signal transduction roles?

38 Differentiating functional association from historical synteny conservation or spurious co-occurrence through coevolution analysis Type III loci For each pair of loci, derive distances from the trees of: 1) universal marker (16S), 2) CRISPR effector, 3) Candidate ancillary cas gene 𝐿 𝑖 , 𝐿 𝑗 : 𝐸𝑓𝑓 𝑖 𝑥 𝐸𝑓𝑓 𝑗 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 (𝐸𝑓𝑓𝑒𝑐𝑡𝑜𝑟 𝑚𝑜𝑑𝑢𝑙𝑒); 𝑀𝑦𝐺𝑒𝑛𝑒 𝑖 𝑥 𝑀𝑦𝐺𝑒𝑛𝑒 𝑗 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒; 16𝑠 𝑟𝑅𝑁𝐴 𝑖 𝑥 16𝑠 𝑟𝑅𝑁𝐴 𝑗 𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 Where 𝐸𝑓𝑓 𝑖 , 𝐸𝑓𝑓 𝑗 and 𝑀𝑦𝐺𝑒𝑛𝑒 𝑖 , 𝑀𝑦𝐺𝑒𝑛𝑒 𝑗 are comparable (alignable)

39 Functional association vs synteny through pairwise distance comparison
In assumption of gene reshuffling So we need to show correlation and gene reshufling Need to show high correlation/coevolution between putative new ancillary gene and cas genes (association) and low correlation between same gene and species tree (16S rRNA)

40 Cas5-Cas10 ClustDist – distance between pairs of Cas5
Cas5 vs Cas10 16s rRNA vs Cas10 16s rRNA vs Cas5 0.84 correlation between the cas5 and cas10 Low correlation with 16S rRNA (0.36, 0.34) 3,218,150 loci pairs found (100k sampled) ClustDist – distance between pairs of Cas5 Cas10Dist – distance between pairs of Cas10 RRNADist – distance between 16s rRNA

41 New ancillary genes: CorA
CorA vs Cas10 16s rRNA vs Cas10 16s rRNA vs CorA R=0.86 correlation between the genes weak correlation with 16s RRNA 2,969 loci pairs found ClustDist – distance between pairs of CorA Cas10Dist – distance between pairs of Cas10 RRNADist – distance between 16s rRNA Clear evidence of coevolution with CRISPR-Cas

42 New ancillary proteins: Membrane CARF
CARF vs Cas10 16s rRNA vs Cas10 16s rRNA vs RT High correlation for every component 282 pairs of loci ClustDist – distance between pairs of CARF proteins Cas10Dist – distance between pairs of Cas10 rRNADist – distance between 16s rRNA Apparent vertical evolution of new accessory gene

43 Systematic procedure developed for detection of
CRISPR-associated genes - flexible parameters, approaches for detecting functionally relevant associations, fully generalizable Core cas genes already known but many diverged subfamilies remain undetected Many new CRISPR accessory genes identified Type III systems are most interesting/complex: -great majority of new genes -membrane association -various forms of signaling What is special about Type III? -ancestral? -functional versatility?

44 Non-interfering CRISPR-Cas systems: no enzymes for target cleavage
Programmed cell death/signal transduction? – type I-E STAND ATPase-containing Transposon integration? – “minimal” type I-F (and some type I-B) Tn7-like transposon-encoded Peters JE, Makarova KS, Shmakov S, Koonin EV. Recruitment of CRISPR-Cas systems by Tn7-like transposons. PNAS 2017 Plasmid maintenance? – type IV Makarova et al. An updated evolutionary classification of CRISPR-Cas systems. NRMicro unpublished Gene expression regulation? – type V-U5 – inactivated RuvC domains Shmakov et al. Diversity and evolution of class 2 CRISPR-Cas systems. NRMicro 2017 Shmakov. Makarova, Wolf, Severinov, Koonin, PNAS 2018 in press + unpublished

45 Take home The basic CRISPR mechanisms are reasonably well understood, and the most common CRISPR variants are known although a plethora of interesting new ones remain to be discovered However, we are only starting to scratch the surface of the broader CRISPR biology, particularly, connections with microbial signal transduction There are many ways CRISPR are used in the arms race and far beyond – remains to be investigated Genome context analysis/guilt by association/icity metrics (to be refined) offer many clues. Attempts to complement with deep learning underway

46 Feng Zhang and http://www.ncbi.nlm.nih.gov/research/groups/koonin/
Collaborators NCBI: Evolutionary Genomics Group Kira Makarova Sergey Shmakov Yuri Wolf Guilhem Faure Feng Zhang and lab (Broad/MIT) Konstantin Severinov (Skolkovo/Rutgers) Joe Peters (Cornell) Winston Yan, David Scott, David Cheng, Shaorong Chong, Huaibin Zhang Arbor Biotechnologies, Cambridge, MA Funding: NIH intramural program


Download ppt "CRISPR-Cas systems: exploiting genome context to infer novel functions"

Similar presentations


Ads by Google