Fundamentals of Forensic DNA Typing

Slides:



Advertisements
Similar presentations
Detecting Degradation in DNA samples
Advertisements

Fundamentals of Forensic DNA Typing Slides prepared by John M. Butler June 2009 Chapter 14 Forensic Challenges.
Brief History of Forensic DNA Typing
DNA: Review, Replication, & Analysis Two types of DNA Nucleic DNA –Found in the nucleus of a cell –Specific to an individual Mitochondrial DNA (mtDNA)
Fundamentals of Forensic DNA Typing
JS 115- Introduction to STRs- Continued I.Pre class activities a.Review Assignments and Schedules 1.Assignment- Read Chapters 6 and 7 Butler, Ch 7 Rudin.
Lecture 12: Autosomal STR DNA Profiling
Three generations of DNA testing
Explain how crime scene evidence is
DNA fingerprinting Every human carries a unique set of genes (except twins!) The order of the base pairs in the sequence of every human varies In a single.
Biotech Continued… How do forensic scientists determine who’s blood has been left at a crime scene? How do forensic scientists determine who’s blood.
Fingerprints: Nuclear DNA standard. =FBI CODIS (Combined DNA Index System) standard for nuclear DNA utilizes 13 highly-variable tetramer STR sites. CSF1PO.
Training on STR Typing Using Commercial Kits and ABI 310/3100 Margaret C. Kline, Janette W. Redman, John M. Butler National Institute of Standards and.
Fundamentals of Forensic DNA Typing
Fundamentals of Forensic DNA Typing Slides prepared by John M. Butler June 2009 Chapter 7 DNA Amplification.
Quantitative PCR Session 2: Overview of qPCR
Chapter 6 Biology of STRs: Stutter Products, Non-template Addition, Microvariants, Null Alleles, and Mutation Rates ©2002 Academic Press.
FBI’s CODIS DNA Database Combined DNA Index System Launched October Links to all 50 states. Used for linking serial crimes and unsolved cases to.
DNA Typing Methods RFLP- restriction fragment length polymorphism.
explain how crime scene evidence is
1 Chapter 7 Chapter 7 DNA Fingerprinting Learning Goals: o Explain how crime scene evidence is collected and processed to obtain DNA o Describe how radioactive.
DNA in the Cell chromosome cell nucleus Double stranded DNA molecule Individual nucleotides PCR, stands for? Polymerase Chain Reaction.what is it? Invented.
310 Data Collection Software Controls 310 run conditions Translates light on CCD camera into electropherogram (raw data) Sample sheets and injection lists.
Real-Time Quantitative PCR Basis
Forensic Biology by Richard Li
Chapter : DQA1/PM Chapter 18: Autosomal STR Profiling.
Fourth quarter: atlas analysis 钟树荣( PhD ,副教授) 昆明医科大学法医学院.
1 Gene Therapy Gene therapy: the attempt to cure an underlying genetic problem by insertion of a correct copy of a gene. –Tantalizingly simple and profound.
Chapter 10 DNA Detection Methods: Fluorescent Dyes and Silver Staining ©2002 Academic Press.
Gel Electrophoresis A molecular biology tool. Purpose To separate and analyze/compare fragments of DNA.
Commonly Used Short Tandem Repeat Markers
Chapter 7 Forensic Issues: Degraded DNA, PCR Inhibition, Contamination, and Mixed Samples ©2002 Academic Press.
Artifacts and noise in DNA profiling Forensic Bioinformatics ( Dan E. Krane, Wright State University, Dayton, OH Forensic DNA Profiling.
1 DNA Polymorphisms: DNA markers a useful tool in biotechnology Any section of DNA that varies among individuals in a population, “many forms”. Examples.
An Expert System for Scoring DNA Database Profiles Dr. Mark W. Perlin Cybergenetics Pittsburgh, PA.
Chapter 13 STR Genotyping Issues ©2002 Academic Press.
Chapter 11 Kendall/Hunt Publishing Company0 DNA Introduction (Continued)
Statistical Analysis of DNA Simple Repeats –Identical length and sequence agat agat agat agat agat Compound Repeats –Two or more adjacent simple repeats.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
The Polymerase Chain Reaction (DNA Amplification)
Advantages of STR Analysis
Crime Scene Investigator PCR Basics™
Forensic Science DNA Analysis 1. History of Biological Evidence in Forensics  DNA fingerprinting  Also known as DNA profiling  Used with a high degree.
All rights Reserved Cengage/NGL/South-Western © 2016.
Polymerase Chain Reaction (PCR). What’s the point of PCR? PCR, or the polymerase chain reaction, makes copies of a specific piece of DNA PCR allows you.
 Types of STR markers- 5 types based on sequence  STR allele nomenclature  Allelic ladder  Serological methods of identity profiling  Identity profiling.
PCR Polymerase chain reaction. PCR is a method of amplifying (=copy) a target sequence of DNA.
Three generations of DNA testing
Generating forensic DNA profiles
Explain how crime scene evidence is
POLYMERASE CHAIN REACTION
Statistical Analysis of DNA
Explain how crime scene evidence is
GENETIC MARKERS (RFLP, AFLP, RAPD, MICROSATELLITES, MINISATELLITES)
All rights Reserved Cengage/NGL/South-Western © 2016.
All rights Reserved Cengage/NGL/South-Western © 2016.
Accurate size calling, consistent band intensities, and low run-to-run migration variations by electrophoresis on ABI 3730xl DNA Analyzer Sizing to within.
Forensic Science DNA Analysis
explain how crime scene evidence is
History of Biological Evidence in Forensics
Explain how crime scene evidence is
DNA Polymorphisms: DNA markers a useful tool in biotechnology
Explain how crime scene evidence is
DNA Fingerprinting and Forensic Analysis
Biotechnology Part 2.
Explain how crime scene evidence is
explain how crime scene evidence is
Explain how crime scene evidence is
Presentation transcript:

Fundamentals of Forensic DNA Typing Chapter 10 STR Typing and Data Interpretation Fundamentals of Forensic DNA Typing Slides prepared by John M. Butler June 2009

Chapter 10 – Data Interpretation Chapter Summary Once peaks are produced in a multi-colored electropherogram, STR genotyping and data interpretation involves deciphering true STR alleles from possible instrumental or biological artifacts. Peaks are typically sized using a “Local Southern” algorithm that uses two peaks from the internal size standard on either side of the sample peak. Only peaks above a detection threshold set by the user will be recognized and recorded in the genotyping software output. Resultant peaks that are sized in comparison to the internal size standard and above the detection threshold are then converted to STR repeat numbers through comparison to a sequenced allelic ladder provided with each STR kit. A high degree of precision must exist from sample-to-sample so that sample alleles can be reliably compared to the allele rungs in the allelic ladder. Typically a sizing bin of +/- 0.5 bp is used around each allele in the STR allelic ladder. Off-ladder alleles, also known as microvariants, that contain nucleotide changes or insertions or deletions in the STR repeat region or immediate flanking regions are known to exist and can be determined with a high precision CE instrument. Other biological artifacts that can complicate STR data interpretation include stutter products, non-template addition, tri-allelic patterns, allele dropout due to “null alleles”, and STR allele mutation. Instrumental or technology-related artifact peaks can arise from sample contaminants, electrical spikes, dye blobs, and matrix (color separation) failure due to off-scale data. Partial profiles or DNA mixtures can further complicate data interpretation. Ultimately, an analyst must decide whether or not two DNA profiles match or can be excluded from coming from the same biological source.

Data Transfer Following data collection, the data (.fsa) files are typically transferred from the lab computer to one in an office where data analysis is performed A USB thumb drive permits rapid and easy transfer of data files

Data Analysis The analyst carefully reviews the DNA data (electropherogram) and checks software genotype calls and edits out artifacts Software designates sample genotypes via comparison to an allelic ladder (mixture of common allele possibilities)

DNA Size (bp) D8S1179 D21S11 D7S820 CSF1PO 6FAM (blue) TH01 D2S1338 AMEL D3 TH01 TPOX D2 D19 FGA D21 D18 CSF D16 D7 D13 D5 VWA D8 D8S1179 D21S11 D7S820 CSF1PO D3S1358 TH01 D13S317 D16S539 D2S1338 D19S433 D18S51 TPOX VWA AMEL D5S818 FGA GS500 LIZ size standard DNA Size (bp) 6FAM (blue) LIZ (orange) PET (red) VIC (green) NED (yellow)

What would be entered into a DNA database for searching: STR Results Individuals will differ from one another in terms of their STR profile STR genotype can then be put into an alpha numeric form for search on a DNA database What would be entered into a DNA database for searching: 16,17-17,18-21,22-12,14-28,30-14,16-12,13-11,14-9,9-11,13-6,6-8,8-10,10

Finally a case report is written based on tabulated STR genotype calls Data is Tabulated The number of repeats observed for each locus is tabulated This data format is stored in databases and used for comparisons/matches Finally a case report is written based on tabulated STR genotype calls

Steps Involved in STR Genotyping Data Collection Peak Identification Data Review by Analyst/Examiner Color Separation Peak Sizing Comparison to Allelic Ladder Confirmation of Results by Second Analyst/Examiner Genotype Assignment to Alleles GeneScan software Genotyper software Internal size standard Matrix file (spectral calibration) Allelic ladder sample GeneMapperID software Expert Systems (e.g., FSS-i3, TrueAllele) Peak Editing to Remove Artifact Calls User-defined thresholds John M. Butler (2009) Fundamentals of Forensic DNA Typing, Figure 10.1 Figure 10.1 Steps involved in the genotyping process for STR profile determination. Software packages for DNA fragment analysis and STR genotyping perform much of the actual analysis, but extensive review of the data by trained analysts/examiners is often required.

Detection thresholds typically vary from Thresholds are set to separate signal from noise – in other words, are we confident that a peak is real? Signal peak height is measured in relative fluorescence units (RFUs) that are related to the amount of DNA present in the sample loaded onto the analysis instrument http://projects.nfstc.org/gallery/main.php?g2_itemId=739 Detection thresholds typically vary from 50 RFU to 200 RFU

Thresholds for Measuring DNA Data These thresholds for reliable data are determined through validation studies Peak is called (deemed “reliable”) 50 RFUs Detection (analytical) threshold Dependent on instrument sensitivity ~50 RFU (relative fluorescence units) Impacted by instrument baseline noise Dropout (stochastic) threshold Dependent on biological sensitivity ~150-200 RFU Important in mixture interpretation Peak is NOT called (deemed “unreliable”) Baseline noise

Peak Detection Thresholds 50 RFUs 150 RFUs Analytical Threshold Interpretation Threshold Baseline Noise Peak reliable, but only used for exclusions Peak reliable, can be used for inclusions Peak not considered reliable Values shown for example purposes only (should be based empirically on a lab’s internal validation) John M. Butler (2009) Fundamentals of Forensic DNA Typing, Figure 10.2 Figure 10.2 Two peak detection thresholds are used by many forensic DNA laboratories. Below the analytical threshold, signal observed is not considered reliable and not recorded as a peak by the data analysis software. A peak with a height in relative fluorescence units (RFUs) above the interpretation threshold is used for inclusionary (statistical) purposes. A peak above the analytical but below the interpretation threshold is only used for exclusionary purposes. Some labs set their interpretation threshold to be equal to their analytical threshold.

DNA Data Quality The raw DNA data itself does not have quality scores directly attached to it. Only the STR allele designations are stored without an indication of data quality. Checks and balances exist in the entire system to try and ensure good quality data. Retesting of offender database sample is performed when a DNA database hit is observed.

Data Interpretation Issues Artifact Peaks vs. Allele Peaks Pull-Up Stutter “N” Peaks Off Ladder Alleles Tri-Alleles Allelic Drop-Out Degradation Inhibition/Primer Binding Site Mutation Mixture Mutation SOURCE: AFDIL training slides

If Pull-Up is present and due to excess fluorescence, one may correct the problem by Dilution of PCR Product. SOURCE: AFDIL training slides

RFU Reporting Threshold SOURCE: AFDIL training slides

Peak Sizing with an Internal Size Standard DNA fragment peaks in sample DNA Size Data Point 147.32 bp 165.05 bp 100 139 150 160 200 250 DNA fragment peaks are sized based on the sizing curve produced from the points on the internal size standard 35 50 75 300 340 350 400 450 490 500 (a) (b) Time (minutes) Region depicted below John M. Butler (2009) Fundamentals of Forensic DNA Typing, Figure 10.3 Figure 10.3 Peak sizing with DNA fragment analysis. An internal size standard, such as GS500-ROX (a), is analyzed along with the DNA sample and used to calibrate the peak data points to their DNA size (b). This standard is labeled with a different color fluorescent dye, in this case ROX (detected as red), so that it can be spectrally distinguished from the STR alleles which are labeled in other colors.

Peak Color Separation and Sizing and STR Genotyping Allelic ladder PCR-amplified sample Internal size standard Color-separated and sized allele peaks for each locus 10 11 12 13 14 15 Data from CE instrument (prior to color separation and peak sizing) Genotyping performed by comparing allelic ladder to sample results Color separation and peak sizing Locus 1 Genotype = 12, 14 All ladder alleles sized using internal size standard All sample alleles sized using internal size standard Genotyping allele bins (+/-0.5 bp around ladder allele) Alleles (# repeats) John M. Butler (2009) Fundamentals of Forensic DNA Typing, Figure 10.4 Figure 10.4 Genotyping is performed through a comparison of sized peaks from PCR-amplified samples to allele size bins. These allele bins are defined with the genotyping software using size information from an allelic ladder run with each batch of samples. Any peak falling in a particular dye color and allele bin size range is designated as an allele for that locus. Peaks in both the allelic ladder and the PCR-amplified samples are sized using the same internal size standard so that they may be compared to one another.

+/- 0.5 bp bin defined around each allele Comparison of Allelic Ladder to Samples to Convert Size into Allele Repeat Number +/- 0.5 bp bin defined around each allele difference = - 0.02 bp difference = + 0.05 bp

Peak height in relative fluorescence units (RFUs) COfiler STR data GeneScan view Genotyper view Allele call (repeat number) determined by comparison of peak size (bp) to allelic ladder allele peak sizes run under the same electrophoretic conditions Peak height in relative fluorescence units (RFUs) Figure 10.3

Common Artifacts in Electropherograms D3S1358 Stutter products 6.0% 7.8% Incomplete adenylation D8S1179 -A +A Biological (PCR) artifacts Dye blob STR alleles stutter Pull-up (bleed-through) spike Blue channel Green channel Yellow channel Red channel John M. Butler (2009) Fundamentals of Forensic DNA Typing, Figure 10.5 Figure 10.5 Hypothetical electropherogram displaying several artifacts often observed with STR typing. Two primary biological artifacts of the PCR amplification process with STRs are stutter and incomplete adenylation that causes split peaks (inset).

one repeat position less and <15% than true allele STR Data Interpretation Involves Determining What is a True Allele (Peak) All of these issues impact mixture interpretation Peak detection threshold Noise (N) Signal (S) Signal >3x sd of noise (or S/N >3) Stutter product True allele Stutter is usually one repeat position less and <15% than true allele Stutter percentage Peak height ratio (PHR) Heterozygote peak balance Allele 1 Allele 2 PHRs consistent with single source are typically above 60%

Stutter Products Peaks that show up primarily one repeat less than the true allele as a result of strand slippage during DNA synthesis Stutter is less pronounced with larger repeat unit sizes (dinucleotides > tri- > tetra- > penta-) Longer repeat regions generate more stutter Each successive stutter product is less intense (allele > repeat-1 > repeat-2) Stutter peaks make mixture analysis more difficult

STR Alleles with Stutter Products D21S11 D18S51 D8S1179 DNA Size (bp) Relative Fluorescence Units Stutter Product 6.3% 6.2% 5.4% Allele Figure 6.1 STR alleles shown with stutter products (indicated by arrows). Only the stutter percentage for the first allele from each locus is noted. Figure 6.1, J.M. Butler (2005) Forensic DNA Typing, 2nd Edition © 2005 Elsevier Academic Press

Measured Stutter Percentages Variable by Allele Length and Composition TH01 9.3 allele: [TCAT]4 -CAT [TCAT]5 FIG. 6—Stutter percentages for the AmpF STR Green I STR loci, TH01 (min 0.6%, max 2.9%, SD 0.4%, n 47), TPOX (min 0.5%, max 3.8%, SD 0.4%, n 70), and CSF1PO (min 0.9%, max 6.1%, SD 0.4%, n 83) measured in single source DNA samples. Default stutter filters in AmpF STR Genotyper templates are 7% for TH01 and TPOX and 9.1% for CSF1PO (32). The X- and Y-axes indicate allele number and stutter percent, respectively. Min minimum measured value, max maximum measured value, SD average standard deviation across all alleles measured, n number of observations. Holt CL, Buoncristiani M, Wallin JM, Nguyen T, Lazaruk KD, Walsh PS. TWGDAM validation of AmpFlSTR PCR amplification kits for forensic DNA casework. J Forensic Sci 2002; 47(1): 66-96.

TH01 TPOX CSF1PO D7S820 D5S818 VWA D3S1358 D13S317 D8S1179 FGA D16S539 Alleles LOCI Figure 10.7 Stutter Percentages for 13 CODIS STR Loci (data adapted from AmpFlSTR manuals). Alleles for each STR locus are shown from smallest to largest. Each locus has a different average stutter percentage but all loci show the trend of increasing stutter with larger alleles (longer number of repeats).

Primary mechanism for stutter product formation GATA CTAT 3’ 5’ 1 2 3 4 5 6 (A) Normal replication GATA CTAT 3’ 5’ 1 2 3 4 5 6 2’ (B) Insertion caused by backward slippage GATA CTAT 3’ 5’ 1 2 3 5 6 (C) Deletion caused by forward slippage 4 C T A Figure 10.6 Illustration of slipped-strand mispairing process that is thought to give rise to stutter products. (A) During replication the two DNA strands can easily come apart in the repeat region and since each repeat unit is the same, the two strands can reanneal out of register such that the two strands are off-set by a single repeat unit. (B) If a repeat unit bulges out on the new synthesized strand during extension then an insertion results in the next round of amplification. (C) If on the other hand, the repeat unit bulge occurs in the template strand, then the resulting synthesized strand is one repeat unit shorter than the full length STR allele. The frequency at which this process occurs is related to the flanking sequence, the repeat unit, and the length of the allele being amplified. Generally for tetranucleotide STR loci, stutter occurs less than 15% of the time and is observed as a small peak one repeat shorter than the STR allele. Primary mechanism for stutter product formation

Stutter Product Formation Repeat unit bulges out when strand breathing occurs during replication True allele (tetranucleotide repeat) Typically 5-15% of true allele in tetranucleotide repeats STR loci Occurs less frequently (typically <2%) – often down in the “noise” depending on sensitivity n-4 stutter product n+4 stutter product Deletion caused by slippage on the copied (bottom) strand Insertion caused by slippage of the copying (top) strand GATA CTAT 3’ 5’ 1 2 3 5 6 4 C T A GATA CTAT 3’ 5’ 1 2 3 2’

Stutter Peaks Taq Polymerase: Stoffel fragment: Strand Slippage: 50-60 nt/second Stoffel fragment: 5-10 nt/second Increased Stutter Strand Slippage: Lower incorporation rate allows more opportunity for the DNA strands to breathe apart during PCR Same mechanism that is responsible for mutation during replication SOURCE: AFDIL training slides

Stutter Peaks Main allele Main allele For tetranucleotide repeat loci, minor product 4 bp shorter than main allele peak Therefore, same size as an authentic allele Can complicate data interpretation TPOX Stutter (2.4%) No stutter There is variability between loci D3S1358 - Average stutter = 7% TPOX - Average stutter = 3% Proportion of stutter to main allele is generally reproducible within a locus and even for a given allele length. Can be characterized to aid in interpretation of mixed specimen profiles Main allele Main allele SOURCE: AFDIL training slides D3S1358 Stutter (6.7%) Stutter (6.8%)

Stutter- What is it and why do we care? Minor product 4 bp shorter than main allele peak Has same size as allele Can complicate interpretation Main allele SOURCE: AFDIL training slides Stutter (6.7%) (6.8%)

What do we know about stutter peaks? There is variability between loci vWA - Average stutter = 7% TPOX - Average stutter = 3% Proportion of stutter to main allele is generally reproducible within a locus and even for a given allele length. Can be characterized to aid in interpretation of mixed specimen profiles SOURCE: AFDIL training slides

D3S1358 TPOX Stutter Stutter Stutter (2.4%) (6.7%) (6.8%) No stutter SOURCE: AFDIL training slides Stutter (2.4%) Stutter (6.7%) Stutter (6.8%) No stutter

Stutter Peaks Percent stutter tends to increase with allele length vWA- 16 vWA- 20 SOURCE: AFDIL training slides Stutter (6%) Stutter (9%)

Stutter Peaks Alleles with longer core repeat unit regions generally have increase proportion of stutter Alleles with interrupted core repeat regions tend to have less proportion of stutter Typical Stutter Reduced Stutter TCTA TCTG TCCA TCTA TCTG TCCA Core repeat region Interrupted by TCTG and TCCA repeats Core repeat region SOURCE: AFDIL training slides

Non-Template Addition Taq polymerase will often add an extra nucleotide to the end of a PCR product; most often an “A” (termed “adenylation”) Dependent on 5’-end of the reverse primer; a “G” can be put at the end of a primer to promote non-template addition Can be enhanced with extension soak at the end of the PCR cycle (e.g., 15-45 min @ 60 or 72 oC) – to give polymerase more time Excess amounts of DNA template in the PCR reaction can result in incomplete adenylation (not enough polymerase to go around) Best if there is NOT a mixture of “+/- A” peaks (desirable to have full adenylation to avoid split peaks) Incomplete adenylation D8S1179 -A +A A

D8S1179 A OR (-A form) (+A form) (A) (B) 5’ 3’ Reverse Primer Forward Polymerase extension Reverse Primer Forward A OR (-A form) (+A form) Shoulder peak -A +A Split peak (A) (B) Full-length allele (n) allele + 1 base (n+1) Measurement Result with dye labeled DNA strand Incomplete adenylation D8S1179 Figure 10.8 Schematic of non-template nucleotide addition shown (A) with illustrated measurement result (B). DNA polymerases add an extra nucleotide beyond the 3’-end of the target sequence extension product. The amount of non-template addition is dependent on the sequence of the 5’-end of the opposing primer. In the case of dye labeled PCR products where the fluorescent dye is on the forward primer, the reverse primer sequence is the critical one.

Impact of the 5’ Nucleotide on Non-Template Addition 5’-CCAAG… 5’-ACAAG… Last Base for Primer Opposite Dye Label (PCR conditions are the same for these two samples) Promega includes an ATT sequence on the 5’-end of many of their unlabeled PP16 primers to promote adenylation see Krenke et al. (2002) J. Forensic Sci. 47(4): 773-785 http://www.cstl.nist.gov/biotech/strbase/PP16primers.htm

Higher Levels of DNA Lead to Incomplete Adenylation D3S1358 VWA FGA -A +A 10 ng template (overloaded) 2 ng template (suggested level) DNA Size (bp) Relative Fluorescence (RFUs) off-scale Figure 6.5 Incomplete non-template addition with high levels of DNA template. In the top panel, partial adenylation (both –A and +A forms of each allele) is seen because the polymerase is overwhelmed due to an abundance of DNA template. Note also that the peaks in the top panel are off-scale and flat-topped in the case of the smaller FGA allele. When the suggested level of DNA template is used, all alleles are fully adenylated (bottom panel). Figure 6.5, J.M. Butler (2005) Forensic DNA Typing, 2nd Edition © 2005 Elsevier Academic Press

Impact of DNA Amount into PCR Reason that DNA Quantitation is Important Prior to Multiplex Amplification Generally 0.5 – 2.0 ng DNA template is best for STR kits Too much DNA Off-scale peaks Split peaks (+/-A) Locus-to-locus imbalance Too little DNA Heterozygote peak imbalance Allele drop-out Locus-to-locus imbalance D3S1358 -A +A 10 ng template (overloaded) 2 ng template (suggested level) DNA Size (bp) Relative Fluorescence (RFUs) 100 pg template 5 pg template DNA Size (bp) Stochastic effect when amplifying low levels of DNA produces allele dropout

Identifiler – Rapid PCR (36 min total time) with 1 min 60 oC adenylation soak (using different polymerases) Result from Peter Vallone (NIST)

Rapid PCR Work and Adenylation Poor adenylation (presence of –A peaks) is locus-specific and impacted by number of loci amplified Result from Peter Vallone (NIST) COfiler amplicons are fully adenylated with 1 min soak

Three Types of “Off-Ladder” Alleles Allelic ladder (b) (c) John M. Butler (2009) Fundamentals of Forensic DNA Typing, D.N.A. Box 10.1 “Variant” Between Ladder Alleles D.N.A. Box 10.1 (figure) Three types of ‘off-ladder’ or ‘variant’ alleles: (a) below ladder alleles, (b) above ladder alleles, and (c) between ladder alleles Below Ladder Above Ladder

Penta D 10, Variant Allele 19 10 AAAGA repeats 10 19 19 AAAGA repeats All sequenced bases align before and after the repeat region. The 19 allele has been previously reported in STRBase. The Penta D ladder has Alleles 2.2, 3.2, 5, 7 – 17 represented.

Microvariant “Off-Ladder” Alleles Defined as alleles that are not exact multiples of the basic repeat motif or sequence variants of the repeat motif or both Alleles with partial repeat units are designated by the number of full repeats and then a decimal point followed by the number of bases in the partial repeat (Bar et al. Int. J. Legal Med. 1994, 107:159-160) Example: TH01 9.3 allele: [TCAT]4 -CAT [TCAT]5 Deletion of T

FGA Variant Allele 28.1 1 = S25-L25 = 244.34 - 244.46 = -0.12 bp John M. Butler (2009) Fundamentals of Forensic DNA Typing, D.N.A. Box 10.1 D.N.A. Box 10.1 (figure) Detection of a microvariant allele at the STR locus FGA. The sample in the bottom panel is compared to the allelic ladder shown in the top panel using Genotyper software. Peaks are labeled with the allele category and the calculated fragment sizes using the internal sizing standard GS500-ROX. 1 = S25-L25 = 244.34 - 244.46 = -0.12 bp 2 = SOL - L28 = 257.51-256.64 = +0.87 bp c = |1 -2| = |-0.12-0.87| = 0.99 bp 28.1

An Example of an “Off-Ladder” Microvariant at the Yfiler Locus DYS635 Allele 22 bin 258.75 +/- 0.5 = 258.25 to 259.25 Allele 21.3 257.84 (-0.41 from bin) Missing T [TCTA]4(TGTA)2[TCTA]2(TGTA)2[TCTA]2(TGTA)2 [TCTA]5 TC-A [TCTA]2

SNPs within the D8S1179 repeat G A C A [TCTA]13 TCTA TCTG [TCTA]11 Repeat is TCTA Three NIST samples have genotypes 13,13. Analysis by Mass Spec indicates the presence of SNPs (Tom Hall, IBIS) Confirmation of the Mass Spec by sequencing at NIST indicates: There are 4 different 13 alleles in these 3 samples. G G C A TCTA TCTG [TCTA]11 TCTA TCTG TGTA [TCTA]10 A C G [TCTA]2 TCTG [TCTA]10

http://www.cstl.nist.gov/biotech/strbase STRbase has a summary of alleles that have been submitted and sequenced, if the submitting agency agrees to share the information. We require a minimum of 10 ng for the sequencing. We request copies of the electropherograms demonstrating the variant allele. The more information we have up front the better. Please have patience we will get to your samples!

Sample Submissions For those that desire more assurances of confidentiality we can have MOUs signed. We generally re-type the samples at NIST prior to starting sequencing. We may run a monoplex assay (single locus). We return results as PowerPoint slides. We thank all of those agencies that have used this free service (thanks to NIJ)! Contact Margaret Kline: margaret.kline@nist.gov

Variant Alleles Cataloged in STRBase http://www.cstl.nist.gov/biotech/strbase/var_tab.htm Off-Ladder Alleles Tri-Allelic Patterns Currently 439 at 13/13 CODIS loci + F13A01, FES/FPS, Penta D, Penta E, D2S1338, D19S433 Currently 170 at 13/13 CODIS loci + FES/FPS, Penta D, Penta E, D2S1338, D19S433

Characterizing a Variant Allele That Occurs Between Two Loci Use a different multiplex STR kit with different locus combinations Test singleplex for each putative locus Example: Identifiler D16S539 and D2S1338 Butler, J.M. (2006) Genetics and genomics of core STR loci used in human identity testing. J. Forensic Sci. 51(2): 253-265

Steps to Detection of Which Locus an Out-of-Range Allele Belongs With… Consider locus heterozygosities – heterozygote is likely from locus with higher heterozygosity (e.g., D16 = 0.766 while D2 = 0.882) Remember that tri-allelic patterns and homozygotes are less common than heterozygotes – thus two heterozygotes are more likely than a homozygote next to a tri-allelic pattern Check STRBase for variant alleles reported previously by other labs (e.g., D16 has no >16 alleles while D2 has several <15 alleles) Consider genotype frequencies observed for the various possible combinations (e.g., D16 11,11 = 10.7% while D2 20,20 = 0.92%)

A state lab submitted to STRBase a new tri-allele: D16S539 “14.2” = 291 bp D2S1338 alleles 11 = 291 bp 12 = 295 bp 13 = 299 bp 14 = 303 bp 15 = 307 bp A state lab submitted to STRBase a new tri-allele: D16S539 10, 12, 14.2 (Identifiler) Likely a D2S1338 allele 11

Types of Tri-Allelic Patterns (b) 1 2 3 Type 1 Type 2 John M. Butler (2009) Fundamentals of Forensic DNA Typing, D.N.A. Box 10.2 (1+2≈3) (1≈2≈3) D.N.A. Box 10.2 (figure) Tri-allelic patterns are sometimes observed at a single locus in a multiplex STR profile. They may be classified into one of two different groups based on relative peak heights: (a) ‘Type 1’ where the sum of two peak heights is almost equal to the third (1+2≈3) or (b) ‘Type 2’ where fairly balanced peak heights are observed (1≈2≈3).

Sum of heights of two of the peaks is equal to the third Three-Peak Patterns Clayton et al. (2004) A genetic basis for anomalous band patterns encountered during DNA STR profiling. J Forensic Sci. 49(6):1207-1214 TPOX D21S11 D18S51 “Type 1” “Type 2” Sum of heights of two of the peaks is equal to the third Balanced peak heights Most common in D18S51 and ….. Most common in TPOX and D21S11

Three Banded Patterns: FGA 20, 25, 26 Alleles [TTTC]3 TTTT TTCT [CTTT]12 CTCC [TTCC]2 20 repeats [TTTC]3 TTTT TTCT [CTTT]17 CTCC [TTCC]2 25 repeats [TTTC]3 TTTT TTCT [CTTT]18 CTCC [TTCC]2 26 repeats This particular tri-allelic pattern has not been reported in STRBase

TPOX (A) AMEL D8S1179 D21S11 D18S51 (B) Figure 10.11 Tri-allelic patterns observed at (A) TPOX and (B) D18S51 from different samples. Allele calls are listed underneath each peak. (A) The TPOX result was obtained with the Identifiler STR kit and run on the ABI 3100. (B) This DNA sample was amplified with the Profiler Plus STR kit, separated on the ABI Prism 310 Genetic Analyzer and viewed with Genotyper software. Only the green dye-labeled PCR products are shown here for simplicity’s sake.

TPOX Tri-Allelic Patterns FSI Genetics 2008; 2(2): 134–137 Approximately 2.4% of indigenous South Africans have three rather than two TPOX alleles. Data collected during routine paternity testing revealed that the extra allele is almost always allele 10 and that it segregates independently of those at the main TPOX locus. Approximately twice as many females as males have tri-allelic genotypes which suggested that the extra allele is on an X chromosome.

TPOX Tri-Allelic Patterns Reported on STRBase http://www.cstl.nist.gov/biotech/strbase/var_TPOX.htm#Tri 6,8,10 (4x) 6,9,10 (5x) 6,10,11 (4x) 6,10,12 (1x) 7,8,10 (2x) 7,9,10 (1x) 7,10,11 (2x) 8,9,10 (14x) 8,9,11 (1x) 8,10,11 (19x) 8,10,12 (4x) 8,11,12 (3x) 9,10,11 (11x) 9,10,12 (2x) 10,10,11 (1x) 10,11,12 (4x) TPOX 10 freq In NIST U.S. pop Af Am 8.9% Cau 5.6% Hisp 3.2% In 78 observations of 16 different TPOX tri-allelic patterns, only 4 times (5%) is allele “10” not present

Smoothing Options None Light Heavy Heavy Light None 73% 38.7% 35.9% Select this option if your data is very sharp, narrow peaks of interest. Light Generally provides the best results for normal data. Heavy Might be appropriate for data from slower runs that have very broad peaks. It might reduce peak size or eliminate narrow peaks. SOURCE: AFDIL training slides Heavy Light None 73% 38.7% 35.9%

Common Problems with Gel Electrophoresis Lane Leakage SOURCE: AFDIL training slides

Null Alleles Allele is present in the DNA sample but fails to be amplified due to a nucleotide change in a primer binding site Allele dropout is a problem because a heterozygous sample appears falsely as a homozygote Two PCR primer sets can yield different results on samples originating from the same source This phenomenon impacts DNA databases Large concordance studies are typically performed prior to use of new STR kits For more information, see J.M. Butler (2005) Forensic DNA Typing, 2nd Edition, pp. 133-138

Impact of Primer Binding Site Mutations * 8 6 Allele 6 amplicon has ‘dropped out’ Imbalance in allele peak heights Heterozygous alleles are well balanced No mutation Mutation at 3’-end of primer binding site (allele dropout) Mutation in middle of primer binding site (a) (b) (c) John M. Butler (2009) Fundamentals of Forensic DNA Typing, D.N.A. Box 10.3 D.N.A. Box 10.3 Impact of a sequence polymorphism in the primer binding site illustrated with a hypothetical heterozygous individual possessing a ‘6,8’ genotype. Arrows represent PCR primers in different positions around the STR repeat region. Heterozygous allele peaks may be (a) well-balanced, (b) imbalanced, or (c) exhibit allele dropout. A ‘null allele’, such as shown in (c), can be detected through use of different PCR primers.

Impact of DNA Sequence Variation in the PCR Primer Binding Site Heterozygous alleles are well balanced 6 8 No mutation Mutation in middle of primer binding site Imbalance in allele peak heights 6 8 * 8 Mutation at 3’-end of primer binding site (allele dropout) * Allele 6 amplicon has “dropped out” Butler, J.M. (2005) Forensic DNA Typing, 2nd Edition, Figure 6.9, ©Elsevier Academic Press

Concordance between STR primer sets is important for DNA databases PowerPlex 16 Search results in a false negative (miss samples that should match) Profiler Plus Allele Dropout Reduced match stringency is a common solution e.g., VWA

vWA Primer Position Comparisons Promega STR Kit PowerPlex® 16 155 bp Polymorphism outside of forward PP16 primer Krenke et al. (2002) J. Forensic Sci. 47:773-785 33 nt 11 bp 9 bp TMR 30 nt TA GenBank = 18 repeats ABI STR Kit Profiler Plus™ Polymorphism impacts 2nd base from the 3’end of ProPlus primer 184 bp 50 bp 11 bp A FAM TA G Walsh, P.S. (1998) J. Forensic Sci. 43: 1103-1104 FAM In 2 out of 1,483 individuals tested = 0.067% Lazaruk et al. (2001) Forensic Sci Int. 119:1-10

D18S51 Null Allele from Kuwait Samples with ABI Primers PowerPlex 16 normal mutation CT Reverse sequence 172 bp downstream of STR repeat (GA) Identifiler 10 nt from 3’end 10 nucleotides from 3’end of ABI D18-R primer (PowerPlex 16 primers are not impacted) Allele 18 drops out Clayton et al. (2004) Primer binding site mutations affecting the typing of STR loci contained within the AMPFlSTR SGM Plus kit. Forensic Sci Int. 139(2-3): 255-259

D13S317 Flanking Region Deletion A 4 bp deletion outside the miniSTR primers causes the commercial kit produced allele to appear one repeat smaller… NIST Identifiler data Sequence analysis identified two regions where 4 bp deletions occur to cause this 1 repeat variation African American sample ZT79305 D13S317 Ohio U miniSTR data Drabek, J., Chung, D.T., Butler, J.M., McCord, B.R. (2004) Concordance study between miniplex STR assays and a commercial STR typing kit, J. Forensic Sci. 49(4): 859-860.

Possible Sequence Variation In and Around the STR Repeat Regions * A) B) C) Example: TH01 9.3 allele (-A in 7th repeat) D18S51 13.2 allele (+AG in 3’-flanking region) Rare VWA allele amplified with AmpFlSTR primers (A-to-T in 2nd base from 3’end of forward primer) Repeat region Reverse primer binding region 3’flanking region 5’flanking region Forward primer binding region STR Possible sequence variation in or around STR repeat regions and the impact on PCR amplification. The asterisk symbolizes a DNA difference (base change, insertion or deletion of a nucleotide) from a typical allele for a STR locus. In situation (A), the variation occurs within the repeat region and should have no impact on the primer binding and the subsequent PCR amplification (although the overall amplicon size may vary slightly). In situation (B), the sequence variation occurs just outside the repeat in the flanking region but interior to the primer annealing sites. Again, PCR should not be affected although the size of the PCR product may vary slightly. However, in situation (C) the PCR can fail due to a disruption in the annealing of a primer because the primer no longer perfectly matches the DNA template sequence.

Different Genetic Tests Can Give Different Results Based on PCR Primer Positions PCR Primers in Different Positions around the STR repeat region Heterozygous alleles are well balanced 6 8 Imbalance in allele peak heights 6 8 * Mutations in the DNA Sequence (impact PCR primer annealing) Figure 6.9 Impact of a sequence polymorphism in the primer binding site illustrated with a hypothetical heterozygous individual. Heterozygous allele peaks may be well-balanced (A), imbalanced (B), or exhibit allele dropout (C). 8 “Null” Allele from Allele Dropout * Allele 6 amplicon has “dropped out” Figure 6.9, J.M. Butler (2005) Forensic DNA Typing, 2nd Edition © 2005 Elsevier Academic Press

Slight imbalance with allele 11 SRM 2391b Genomic 8 with D16S539 Identifiler All allele calls with MiniFiler for CSF1PO, D7S820, D13S317, D18S51, D21S11, FGA, and D16S539 (with the exception noted below) match previously certified values. MiniFiler PowerPlex 16 Allele dropout* Slight imbalance with allele 11 *Due to primer binding site mutation

Apparent Null Alleles Observed During Concordance Studies 10/13 CODIS loci affected so far Locus STR Kits/Assays Compared Results Reference VWA   PP1.1 vs ProPlus Loss of allele 19 with ProPlus; fine with PP1.1 Kline et al. (1998) D5S818 PP16 vs ProPlus Loss of alleles 10 and 11 with PP16; fine with ProPlus Alves et al. (2003) D13S317 Identifiler vs miniplexes Shift of alleles 10 and 11 due to deletion outside of miniplex assay Butler et al. (2003), Drabek et al. (2004) D16S539 PP1.1 vs PP16 vs COfiler Loss of alleles with PP1.1; fine with PP16 and COfiler Nelson et al. (2002) D8S1179 Loss of alleles 15, 16, 17, and 18 with ProPlus; fine with PP16 Budowle et al. (2001) FGA Loss of allele 22 with ProPlus; fine with PP16 Budowle and Sprecher (2001) D18S51 SGM vs SGM Plus Loss of alleles 17, 18, 19, and 20 with SGM Plus; fine with SGM Clayton et al. (2004) CSF1PO PP16 vs COfiler Loss of allele 14 with COfiler; fine with PP16 TH01 Loss of allele 9 with COfiler; fine with PP16 D21S11 Loss of allele 32.2 with PP16; fine with ProPlus New Section of STRBase (launched to track MiniFiler discordance and allele dropout frequency): http://www.cstl.nist.gov/biotech/strbase/NullAlleles.htm From Table 6.2 in J.M. Butler (2005) Forensic DNA Typing, 2nd Edition, p. 136

STR Typing Measurement Issues STR genotypes are generated using PCR amplification and electrophoretic sizing that involves an internal size standard with each sample. The forensic DNA community almost exclusively uses STR typing kits to obtain results (there are different kits available that examine the same common markers). PCR amplification is expected to generate consistent genotypes as long as primer positions are not changed between kits. Primer changes can result in allele dropout due to primer site mutations. Occasionally new commercial kits are created with additional loci. General STR repeat nomenclature rules have been established but do have some subjectivity in them permitting possible differences in how STR alleles are named.

Two Different Independent Methods Used Size Analysis/Genotyping Electrophoretic separation and sizing of PCR product compared to an internal size standard followed by comparison to the sizes of one or more sequenced alleles (could be commercially available allelic ladder) run in-house with the same conditions, instrument, and internal size standard DNA Sequence Analysis Isolation of each individual allele DNA sequence analysis followed by direct counting of the number of repeats (and correlation to size variation observed during STR typing)

Comparing DNA Sequencing Information to STR Typing Data 12 GAAA repeats Gel separation Excision of bands Typing results matched sequenced alleles Using a commercial allelic ladder 15 GAAA repeats SRM 2395 Component A (DYS385 a/b alleles)

Incomplete adenylation D8S1179 -A +A D3S1358 Stutter products 6.0% 7.8% Tri-allelic pattern TPOX Variant allele D7S820 TH01 Variant allele DEGRADED DNA D5S818 D13S317 D7S820 D16S539 CSF1PO Penta D D3S1358 TH01 D13S317 D16S539 D2S1338 MIXTURE

Chapter 10 – Points for Discussion Is it better to prevent or promote non-template addition? Explain your reasons. How is a degenerate primer used in STR typing assays? Discuss some advantages and disadvantages to using a degenerate primer. What observations indicate that a peak in an electropherogram is a spike as opposed to DNA? What steps are typically taken to confirm the presence of an off-ladder allele? Why are mutations and mutation rates a concern with parentage and kinship analysis but not with forensic DNA testing?