Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray.

Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray. CEB - ESD - LBNL Todd DeSantis, Sonya Murray, Jordan Moberg, Gary Andersen Carol Stone (DSTL, U.K.) What bugs are in my sample?

The ponderings of a toddler Why must Mom confiscate my “Hello Kitty” blanket on laundry day? Will the swings be wet at the park? How will this sausage impact the diversity in my lower G.I. bacterial community? Will I inhale any archaeal microorganisms when I visit the hot springs? Gianna DeSantis

Every discarded water sample, geological core, or spent air filter is lost data. But who wants to do all the work? –Culture? Anaerobes? non-cultivable? Safety? –Analysis of nucleic acids isolated from environment Must classify or sort heterogeneous nucleic acids into bins. –Restriction Fragment Length Polymorphisms (RFLP) –Single Stranded Conformation Polymorphisms (SSCP) –Temp/Denat Gradient Gel Electrophoresis (T/DGGE) –Sequencing »Provides taxonomic nomenclature »estimates the relative abundance »Need to create, clone, & process hundreds of samples Can we create a simple, quantitative, comprehensive microbial test?

Outline Goals Experimental Approach Organization of rDNA sequences into taxa (CASCADE-P) Assigning sets of probes for each taxa Using 16S GeneChip for quantitative aerosol analysis

Project Overview Goal –Create a single microarray capable of detecting and quantifying bacterial and/or archaeal organisms in a complex sample. Approach –Combinatorial power of multiple probes for sequence-specific hybridization

16S rRNA gene (16S rDNA) Used to identify and classify organisms by gene sequence variations. Variations have been used in design of DNA probes for the detection of: –taxonomic domains, divisions, groups … –specific organisms

The Ribosome rDNA rRNA (functional molecule) LSU SSU 16s or 18s

The Ribosome Folded secondary structure Essential functional component Conserved spans –structure must be retained for viability –targeted for universal/group-specific PCR primers and probes Variable regions –spans not fundamental to the folded structure –receive less pressure from natural selection –probed for genus and species level discrimination

What could be amplified? Universal 16S PCR primers  complex population of amplicons. Must define the targets to consider as the Potential Amplicon Set. Variable

5’ 3’ 1390 1507 Region interrogated on chip pA Ccomp 1492R 20 base DNA signature segments on chip = probe set Sample reacts only with complementary signature sequences on chip SSU rDNA First generation rDNA Array uses 85-base highly variable region of ribosomal DNA

http://greengenes.llnl.gov/16S Comprehensive Aligned Sequence Construction for Automated Design of Effective Probes Igor Dubosarskiy –Java implementations Tim Harsch –RDBMS consultations Lisa Corsetti –Apache module management Kevin Melissare –Graphics

2.30.9.2.10 5 th Level: C.ACETOBUTYLICUM_SUBGROUP 4 th Level: C.BOTULINUM_GROUP 3 rd Level: CLOSTRIDIUM_AND_RELATIVES 2 nd Level: GRAM_POSITIVE_BACTERIA 1 st Level: BACTERIA Clostridium collagenovorans DSM 3089 (T) Clostridium sardiniensis ATCC 33455 (T) Clostridium acetobutylicum ATCC 824 (T) Clostridium acetobutylicum DSM 792 (T) Clostridium acetobutylicum ATCC 824 (T) Clostridium acetobutylicum NCDO 1712 Clostridium acetobutylicum DSM 1731 2.28.3.27.2 5 th Level: ESCHERICHIA_SUBGROUP 4 th Level: ENTERICS_AND_RELATIVES (Group) 3 rd Level: GAMMA_SUBDIVISION 2 nd Level: PROTEOBACTERIA 1 st Level: BACTERIA U85138 clone ACK-SA7 AE000452 Escherichia coli str. K-12 Er.trachep Erwinia tracheiphila LMG 2906 (T) E.coliK12 Escherichia coli [gene=rrnG gene] Haf.alvei3 Hafnia alvei S.tymuriu3 Salmonella typhimurium str. Stm1 Shi.boydii Shigella boydii AF084835 str. KN4 S.enterit4 Salmonella enteritidis str. SE22 S.ptyphi6 Salmonella paratyphi S.typhi3 Salmonella typhi str. St111 S.bovismrb Salmonella bovis morbificans Sbm1 Alt.agrlyt Alterococcus agarolyticus str. ADT3 Shi.flxne2 Shigella flexneri ATCC 29903 (T) Hierarchical Phylocodes

Chip Taxa Avoid groupings based on historical nomenclature. Sequence-dependent classification by transitive similarity clustering. Each sequence must end up in exactly 1 taxon. if x R y & y R z  x R z

Assigning Probes for GeneChip Microarray Select probe sets for each taxon Ideal Probe Present in all sequences of the taxon Not present outside the taxon Unable to X-hybe with seqs in other taxa Ideal Mis-match Control Probe Unable to X-hybe to any sequence

Finding groupings se q 123456789101112131415161718192021222324 A B C D E F G H I J K L M N O sequences probes Consider A – O to be 16S sequences. Consider 1 – 24 to be probes already embedded on the chip. First, associate all available probes with all available sequences. Let probe similarities drive sequence groupings.

Finding groupings se q 123456789101112131415161718192021222324 A B C D E F G H I J K L M N O Consider A – O to be 16S sequences. Consider 1 – 24 to be probes already embedded on the chip. First, associate all available probes with all available sequences. Let probe similarities drive sequence groupings.

Progressive Transitive Clustering DEFINE: upp (useful probe pair): a PM,MM pair where the 20-mer PM complements all intra-cluster sequences AND the central 16- mer of PM does not complement any extra-cluster sequences AND the central 16-mer of the MM does not complement any sequence. Probe pairs are reassessed whenever the sequence clusters are altered. nGBupp: number of upps for a cluster, these probe pairs globally differentiate a cluster from all other sequences. L: the value of nGBupp which must be met for a cluster to be locked. nPW uppA: number of useful probe pairs which pair-wise differentiate clustA from clustB nPW uppB: number of useful probe pairs which pair-wise differentiate clustB from clustA m: the value of nPW upp which must be met to inhibit two clusters from merging. FOR L (11.. 4) DO FOR m (1.. 10) DO Determine nGBupp for each cluster; Lock all clusters where nGBupp ≥ L ; Pair-wise compare non-locked clusters (clustA, clustB); UNLESS (nPW uppA ≥ m AND nPW uppB ≥ m) Merge sequences of clustA and clustB into one cluster; END UNLESS END FOR Uncluster non-locked clusters; END FOR 650 clusters found

cctagcatgCattctgcata cctagcatgGattctgcata MATCH MISMATCH Approach: Custom Affymetrix  GeneChip Massive parallelism – Up to 500,000 probes in a 1.28 cm 2 array Identification of multiple species in a mixed population Single nucleotide mismatch resolution

General Protocol Air Soil Feces Blood Water rRNA gDNA Universal 16S rDNA PCR Contains probes adhered to glass surface in grid pattern.

50 µ ACGGTCGAACGGTCGA ACGGTCGAACGGTCGA ACGGTCGAACGGTCGA ACGGTCGAACGGTCGA ACGGTCGAACGGTCGA Hybridize PCR Amplify DNA Fractionate DNA Biotin End-label Locating Hybridization Events

ParameterFrankiaClostridium Positive fraction1.000.64 Average difference3720625 Frankia sp. str. G48 PM MM Clostridium butyricum

Can the chip detect more than one analyte?

Combinatorial scoring of “Probe Sets” are able to categorize mixed samples. OTU % pos pairs 2.30.7.12.1.013 * 100 2.30.7.12.1.014 46 – 57 2.30.7.12.1.015 54 - 61 2.30.7.12.1.016 39 – 54 2.30.7.12.1.017 18 2.30.7.12.2.002 11 2.30.7.12.2.003 14 2.30.7.12.2.005 14 – 32 2.30.7.12.2.006 18 – 32 2.30.7.12.2.007 21 – 25 2.30.7.12.2.008 14 – 29 2.30.7.12.3.001 7 – 25 2.30.7.12.3.002 8 2.30.7.12.3.003 4 2.30.7.12.3.004 7 – 11 2.30.7.12.3.005 4 – 14 2.30.7.12.3.006 11 2.30.7.12.3.007 14 – 29 2.30.7.12.3.008 7 2.30.7.12.3.009 4 – 11 2.30.7.12.3.010 0 - 4 2.30.7.12.4.001 21 – 36 2.30.7.12.4.004 * 100 2.30.7.12.4.005 0 – 11 2.30.7.12.4.006 29 – 54 2.30.7.12.4.007 11 – 14 2.30.7.12.4.008 11 S. aureus spike B. anthracis spike Can the chip detect more than one analyte?

OTU % pos pairs 2.30.7.12.1.013 * 100 2.30.7.12.1.014 46 – 57 2.30.7.12.1.015 54 - 61 2.30.7.12.1.016 39 – 54 2.30.7.12.1.017 18 2.30.7.12.2.002 11 2.30.7.12.2.003 14 2.30.7.12.2.005 14 – 32 2.30.7.12.2.006 18 – 32 2.30.7.12.2.007 21 – 25 2.30.7.12.2.008 14 – 29 2.30.7.12.3.001 7 – 25 2.30.7.12.3.002 8 2.30.7.12.3.003 4 2.30.7.12.3.004 7 – 11 2.30.7.12.3.005 4 – 14 2.30.7.12.3.006 11 2.30.7.12.3.007 14 – 29 2.30.7.12.3.008 7 2.30.7.12.3.009 4 – 11 2.30.7.12.3.010 0 - 4 2.30.7.12.4.001 21 – 36 2.30.7.12.4.004 * 100 2.30.7.12.4.005 0 – 11 2.30.7.12.4.006 29 – 54 2.30.7.12.4.007 11 – 14 2.30.7.12.4.008 11 Percent of probe-pairs scored positive for each probe set in the Staphylococcus Group. Hybridization results from spike-in experiment done in triplicate. Sonya Murray Aubree Hubbel Can the chip detect more than one analyte? Combinatorial scoring of “Probe Sets” are able to categorize mixed samples.

Application Example Does air filter sample processing affect detection? –Method 1 Wash particles from filter with SDS Digest particles with lysozyme Purify DNA using Qiagen kit –Method 2 Pulverize filter and particles with bead mill, SDS, P:C:ISA Purify DNA using MoBio kit and Sephacryl column

Bead beating allowed greater diversity to be detected.

Quantitative Analysis Could the concentration of each amplicon in a sample be measured by fluorescence intensity? Experimental setup for 20 point Latin Square calibration: Experiment Oc.oenosFer.nodSap.grandM.neuroH20H20Environmental amplicons* 1 5133174NoYes 2 133174143NoYes 3 31741435NoYes 4 74143513NoYes 5 14351331NoYes 6 0000 * 18uL of products from 30 cycle universal 16S PCR of gDNA extracted from U.K. air sample. SPIKE CONCENTRATION (pM in Hybridization Solution) Sonya Murray Carol Stone

Oo Fn Sg Mn 1 5 (5474) 13 (16069) 31 (31805) 74 (124732) 2 13 (7885) 31 (61185) 74 (81107) 143 (115237) 3 31 (58912) 74 (70317) 143 (98235) 5 (8759) 4 74 (101803) 143 (69529) 5 (7789) 13 (11530) 5 143 (149869) 5 (4534) 13 (16228) 31 (56103) 6 n.a. n.a n.a. n.a. Final concentration of spike in hybridization in pM. Values in parentheses are the resulting hybridization signal in arbitrary units (a.u.) obtained from the Latin Square experiments. All spikes were added to 18µL of products of 30 cycle universal SSU PCR of gDNA extracted from air samples using Method 2.

Log 2 transformed Linear Least Squares Regression Pearson’s corr coeff was significant (df=18) 95% confidence intervals calculated according to: National Measurement System’s Valid Analytical Measurement Programme (VAM)

Environmental community is measured with confidence intervals. Conf Interval: Conc  (t(RSE)/b)(1/m+1/n+((Y-y) 2 ) / (b 2 (n-1)s x 2 )) b = slope from regression Y = mean of 6 replicate measurements m = number of repeat measurements = 6 y = mean of the HybScores for the 20 points used for calibration t = critical value obtained from t-table for 18 d.f. for 95% = 1.734 RSE = residual standard error of calibration points = 0.56 s x = standard deviation of the conc. for the 20 points used for calibration

Summary The SSU microarray was able to rapidly quantify and taxonomically classify of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic orgins.

Acknowledgements Gary Andersen – group Leader Carol Stone – sample collection, hybridization Sonya Murray - hybridizations

Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray.

Similar presentations

Presentation on theme: "Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray.

Similar presentations

Presentation on theme: "Rapid quantification and taxonomic classification of a complex consortium of rDNA amplicons from both prokaryotic and eukaryotic origins using a microarray."— Presentation transcript:

Similar presentations

About project

Feedback