Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Reasonable Safeguards against Contamination in mtDNA Testing, And Some Database Issues Dr. Frederika Kaestle (Depts of Anthropology and Biology, IU Bloomington)

Similar presentations


Presentation on theme: "1 Reasonable Safeguards against Contamination in mtDNA Testing, And Some Database Issues Dr. Frederika Kaestle (Depts of Anthropology and Biology, IU Bloomington)"— Presentation transcript:

1 1 Reasonable Safeguards against Contamination in mtDNA Testing, And Some Database Issues Dr. Frederika Kaestle (Depts of Anthropology and Biology, IU Bloomington) and Dr. Jason Eshleman (Trace Genetics)

2 2 Ancient DNA and Forensic DNA Analysis Similarities: Template DNA unknown Template DNA unknown Often limited template Often limited template Template often highly degraded Template often highly degraded Far from ideal sources for biological research Far from ideal sources for biological research

3 3 Ancient DNA and Forensic DNA Analysis Differences: Ancient DNA still largely limited to mtDNA analysis. Ancient DNA still largely limited to mtDNA analysis. Ancient DNA is rarely searching for exact match. Ancient DNA is rarely searching for exact match. Audience (justice system vs. academic community) familiarity with genetics differ significantly. Audience (justice system vs. academic community) familiarity with genetics differ significantly. Level of skepticism remains high with ancient DNA. Level of skepticism remains high with ancient DNA. Hypothesis testing is overt Hypothesis testing is overt

4 4 Limitations of Ancient DNA Minimal template. Minimal template. PCR inhibitors co-extracted. PCR inhibitors co-extracted. Sources have often been heavily handled. Sources have often been heavily handled. That property of mtDNA (high copy number) that makes it possible to analyze likewise makes contamination a more significant problem. That property of mtDNA (high copy number) that makes it possible to analyze likewise makes contamination a more significant problem.

5 5 Contamination Sources Lab-ware and reagents come pre- contaminated from the manufacturer Lab-ware and reagents come pre- contaminated from the manufacturer PCR tubes are particularly notorious (Schmidt et al. [1995] estimates high rate of mtDNA contamination in lab disposables) PCR tubes are particularly notorious (Schmidt et al. [1995] estimates high rate of mtDNA contamination in lab disposables) Taq (the enzyme that catalyzes PCR) has also been shown to be contaminated with mtDNA Taq (the enzyme that catalyzes PCR) has also been shown to be contaminated with mtDNA Other reagents shown to be contaminated include nucleotides, buffer, primers Other reagents shown to be contaminated include nucleotides, buffer, primers

6 6 Contamination Sources Sample surface Sample surface Contamination may have existed prior to sample (e.g. bone, hair) being deposited Contamination may have existed prior to sample (e.g. bone, hair) being deposited Contamination may have occurred after deposition but before collection Contamination may have occurred after deposition but before collection Contamination may have occurred during the collection of the sample Contamination may have occurred during the collection of the sample Contamination may have occurred during the storage and transport of the sample Contamination may have occurred during the storage and transport of the sample Contamination may have occurred at the laboratory after sample delivery Contamination may have occurred at the laboratory after sample delivery

7 7 Contamination Sources Carryover PCR Carryover PCR Contamination may occur due to residues from previous PCR amplifications in the lab remaining on lab ware, lab surfaces, lab clothing, lab equipment, lab air Contamination may occur due to residues from previous PCR amplifications in the lab remaining on lab ware, lab surfaces, lab clothing, lab equipment, lab air This is particularly problematic due to the high copy number of the PCR amplicons This is particularly problematic due to the high copy number of the PCR amplicons

8 8 Contamination Sources People People With or without access to facilities With or without access to facilities Lab personnel shed DNA throughout the day Lab personnel shed DNA throughout the day Lab personnel carry DNA from others into the lab every day on the surfaces of their clothing and body Lab personnel carry DNA from others into the lab every day on the surfaces of their clothing and body

9 9 Identifying contamination (or how to have your paper rejected upon submission) Does sample match lab personnel? Does sample match lab personnel? But you cant rule out everyone. Identifying contamination requires some serendipity. But you cant rule out everyone. Identifying contamination requires some serendipity. Understand negative controls are at times insufficient Understand negative controls are at times insufficient Low level in background can be mistaken for sample. Low level in background can be mistaken for sample. Neg. control might not be contaminated, but sample is (e.g. PCR tube) Neg. control might not be contaminated, but sample is (e.g. PCR tube) The sequence doesnt make sense. The sequence doesnt make sense. But now we run the risk of only finding what were looking for. But now we run the risk of only finding what were looking for.

10 10 What IS contamination? Unwanted DNA Unwanted DNA Why is it unwanted? Why is it unwanted? Analyst did not intend to extract it Analyst did not intend to extract it DNA is DNA DNA is DNA (It makes for a messy story? The study would be so much nicer if we didnt find it?) (It makes for a messy story? The study would be so much nicer if we didnt find it?) Contamination does not answer the question at hand. Contamination does not answer the question at hand. But whats the question? But whats the question?

11 11 What Questions Can We Answer? Are two samples different? Are two samples different? What can we infer from this? What can we infer from this? Not from the same source (but remember heteroplasmy, and possibility of contamination) Not from the same source (but remember heteroplasmy, and possibility of contamination) Are two samples the same? Are two samples the same? What can we infer from this? What can we infer from this? MIGHT be from the same source (identity by descent, possibility of contamination) MIGHT be from the same source (identity by descent, possibility of contamination) Note that anthropologists often are asking questions about FREQUENCIES of mtDNA types, not about single samples Note that anthropologists often are asking questions about FREQUENCIES of mtDNA types, not about single samples Allows us to make population-level inferences Allows us to make population-level inferences

12 12 What IS contamination? Thus, one way to view contamination is as DNA that leads to a false inference *if* we do not know out data are compromised. Thus, one way to view contamination is as DNA that leads to a false inference *if* we do not know out data are compromised.

13 13 So How Do We Detect Contamination? Protocols designed to detect Protocols designed to detect Negative controls Negative controls Reagent/Extraction blank Reagent/Extraction blank Amp Negative/No Template Control Amp Negative/No Template Control Controls alert us to the presence of DNA in a tube Controls alert us to the presence of DNA in a tube We infer what this means vis a vis our samples We infer what this means vis a vis our samples Comparison to sequences of probable contaminating individuals Comparison to sequences of probable contaminating individuals Lab personnel Lab personnel Excavators (evidence collection team) Excavators (evidence collection team) Museum Curators and Researchers (staff with access to evidence storage) Museum Curators and Researchers (staff with access to evidence storage)

14 14 Case Study: contamination and multiple inferences bones and teeth found in the desert bones and teeth found in the desert bones identified as teenage boy, missing in bones identified as teenage boy, missing in mtDNA identifies match between bones and boys mother mtDNA identifies match between bones and boys mother NEGATIVE CONTROLS CLEAN THROUGHOUT NEGATIVE CONTROLS CLEAN THROUGHOUT

15 15 Initial mtDNA results It is *possible* that the bone was really from the missing boy If this is a match there is also contamination This is still merely a suggestive result as there are at least four 2-source mixtures that can produce this result Result requires confirmation Mother: 16224, 16287, Bone: 16224T/C, 16287C/T, 16311T/C Conclusion: mixture, possible match

16 16 Subsequent mtDNA Results 2 nd extraction: ND (extraction failed) 3 rd extraction: 16085, 16111, 16223, 16257, 16261, 16286, th extraction: 16082A/C, 16183A/C, 16189T/C, 16217T/C, 16223, 16290, 16291, 16319, th extraction: 16183A/c, 16223, 16319G/a, 16325C/t, 16362T/C 6 th extraction: 16183, 16187, 16189, th + 8th extractions: 16224, 16287, Mother: 16224, 16287, 16311

17 17 mtDNA Results Mother: 16224, 16287, Bone: 16224T/C, 16287C/T, 16311T/C 7th + 8th extractions: 16224, 16287, Match confirmed!

18 18 Timeline 2 nd extraction (no result): 7/19/02 3 rd extraction (no match): 8/6/02 4 th extraction (no match): 8/13/02 5 th extraction (no match): 8/22/02 6 th extraction (no match): 9/3/02 7 th extraction (matching): 9/10/02 8 th extraction (matching): 9/28/02

19 19 Timeline 2 nd extraction (no result): 7/19/02 3 rd extraction (no match): 8/6/02 4 th extraction (no match): 8/13/02 5 th extraction (no match): 8/22/02 6 th extraction (no match): 9/3/02 7 th extraction (matching): 9/10/02 8 th extraction (matching): 9/28/02 Benchnotes reveal 9/10 references and extractions performed at same time by same analyst. Extractions 6, 7 and 8 performed on one bone sample. New references: 9/10/02 }

20 20 Inferences? Possible that bones really are from missing child. Possible that bones really are from missing child. Possible that errant handling contaminated bone with reference sample. Possible that errant handling contaminated bone with reference sample. CERTAIN that improper lab practices were used. CERTAIN that improper lab practices were used. Contamination existed, ignored when it did not fit with desired result. Contamination existed, ignored when it did not fit with desired result.

21 21 Contamination Control Tools of the aDNA Trade Keep it Clean Surface decontamination of bone helps (Bouwman et al 2006) Surface decontamination of bone helps (Bouwman et al 2006) DNase I (Eshleman and Smith 2001) cleanup of reagents and tubes DNase I (Eshleman and Smith 2001) cleanup of reagents and tubes Positive Pressure HEPA-filtered air in the lab Positive Pressure HEPA-filtered air in the lab Regular UV-irradiation of surfaces Regular UV-irradiation of surfaces Controlled and limited access to the lab Controlled and limited access to the lab Dedicated and disposable laboratory clothing and shoes Dedicated and disposable laboratory clothing and shoes Prevent Carryover Uni-directional travel between extraction and PCR laboratories Uni-directional travel between extraction and PCR laboratories Use of dUTP (Uracil) and pre-digestion of subsequent PCR reactions Use of dUTP (Uracil) and pre-digestion of subsequent PCR reactions (and ideally) Independent confirmation (in temporally separate extraction/amplification procedures and possibly at another laboratory) Split the samples first! Why analyze contamination twice? Split the samples first! Why analyze contamination twice?

22 22 Kennewick Man Case Study 3 independent ancient DNA laboratories, utilizing standard contamination controls 3 independent ancient DNA laboratories, utilizing standard contamination controls 3 independent samples of ancient Native American individual discovered near Kennewick, WA 3 independent samples of ancient Native American individual discovered near Kennewick, WA Results after multiple extraction and amplification attempts (all negative controls clean): Results after multiple extraction and amplification attempts (all negative controls clean): Lab 1: multiple failures at amplification, followed by sequence identical to lab director Lab 1: multiple failures at amplification, followed by sequence identical to lab director Lab 2: multiple failures at amplification, followed by sequence identical to student who had not been in the ancient DNA laboratory (or in town) for approximately 2 years Lab 2: multiple failures at amplification, followed by sequence identical to student who had not been in the ancient DNA laboratory (or in town) for approximately 2 years Lab 3: multiple failures at amplification, followed by sequence identical to lab manager who had never entered the ancient DNA laboratory Lab 3: multiple failures at amplification, followed by sequence identical to lab manager who had never entered the ancient DNA laboratory

23 23 Neanderthal Case Study Neanderthal remains had been curated in European museum for several years Neanderthal remains had been curated in European museum for several years Ancient DNA laboratory personnel extract DNA from tooth of a Neanderthal (39,900 years old), using standard precautions Ancient DNA laboratory personnel extract DNA from tooth of a Neanderthal (39,900 years old), using standard precautions 2 independent extractions performed 2 independent extractions performed Both sequences identical to each other Both sequences identical to each other Neanderthal? Neanderthal? Sequences compared to potential contaminating individuals Sequences compared to potential contaminating individuals Sequence identical to paleontologist who had studied remains extensively Sequence identical to paleontologist who had studied remains extensively

24 24 Reality Check There is no magic bullet Contamination is a reality in DNA work. Contamination is a reality in DNA work. Negative controls are an indicator, not a solution, not a guarantee. Negative controls are an indicator, not a solution, not a guarantee. Thinking clearly, asking the right questions and posing the alternatives is essential. Thinking clearly, asking the right questions and posing the alternatives is essential.

25 25 mtDNA Database Assuming you have mtDNA results, what do they mean? Assuming you have mtDNA results, what do they mean? Previous presenters discussed many issues associated with the mtDNA database, but I would like to concentrate on an issue that has emerged from the majority of anthropological genetic research on mtDNA Previous presenters discussed many issues associated with the mtDNA database, but I would like to concentrate on an issue that has emerged from the majority of anthropological genetic research on mtDNA

26 26 Relevant mtDNA Basics mtDNA inherited maternally, and does not recombine – distribution of sequences is determined by movement of females mtDNA inherited maternally, and does not recombine – distribution of sequences is determined by movement of females mtDNA has very fast mutation rate compared to nuclear DNA – new mutations crop up all the time mtDNA has very fast mutation rate compared to nuclear DNA – new mutations crop up all the time This creates a situation in which new mtDNA lineages are very rare, and generally of limited distribution, and lineages in general are not randomly distributed This creates a situation in which new mtDNA lineages are very rare, and generally of limited distribution, and lineages in general are not randomly distributed

27 27 Relevant Anthropological Basics People do not move randomly on the landscape People do not move randomly on the landscape Tend not to move large distances Tend not to move large distances Tend to follow family and friends, members of their religion/language group/caste etc. Tend to follow family and friends, members of their religion/language group/caste etc. Tend to follow paths of least resistance (valleys, rivers) Tend to follow paths of least resistance (valleys, rivers) Tend to follow jobs/game/other economically important resources Tend to follow jobs/game/other economically important resources Are occasionally moved against their will Are occasionally moved against their will

28 28 African American Example Africans have highest level of mtDNA variation in the world, and highest level of rare mtDNA sequences Africans have highest level of mtDNA variation in the world, and highest level of rare mtDNA sequences Slaves transported non-randomly to Americas Slaves transported non-randomly to Americas South Carolina plantation owners grew mostly rice, so preferred slaves from West Africa who knew how to grow rice South Carolina plantation owners grew mostly rice, so preferred slaves from West Africa who knew how to grow rice Virginia plantation owners had to deal with malaria from mosquitoes who thrived in surrounding swamps, so preferred slaves from the Gold Coast of Africa who were resistant to malaria Virginia plantation owners had to deal with malaria from mosquitoes who thrived in surrounding swamps, so preferred slaves from the Gold Coast of Africa who were resistant to malaria Louisiana slave owners tended to purchase slaves from Portuguese and French traders, who took slaves from Angola Louisiana slave owners tended to purchase slaves from Portuguese and French traders, who took slaves from Angola

29 29 African American Example During the Great Migration ( ), large numbers of African Americans moved out of the rural south (for better jobs in light of WWI and a boll weevil crop infestation) During the Great Migration ( ), large numbers of African Americans moved out of the rural south (for better jobs in light of WWI and a boll weevil crop infestation) Those from Mississippi, Alabama, Louisiana followed Miss R. north to large cites of Midwest Those from Mississippi, Alabama, Louisiana followed Miss R. north to large cites of Midwest Those from Carolinas and Virginia followed the coastline to D.C., Philly, NYC. Those from Carolinas and Virginia followed the coastline to D.C., Philly, NYC. But the majority of African Americans remain in the SE even today. But the majority of African Americans remain in the SE even today.

30 30 African American Example In addition to actual population movement, the genetic make-up of African American groups across the US varies significantly in the level of admixture with non-Africans In addition to actual population movement, the genetic make-up of African American groups across the US varies significantly in the level of admixture with non-Africans Level of admixture with European Americans is higher in the West, and in large northern cities (likely due to different social mores) Level of admixture with European Americans is higher in the West, and in large northern cities (likely due to different social mores) Level of admixture with Native Americans is higher in the West and Southwest (probably due to a combination of social mores and the higher number of Native Americans resident in the West) Level of admixture with Native Americans is higher in the West and Southwest (probably due to a combination of social mores and the higher number of Native Americans resident in the West)

31 31 mtDNA Database Does the database take into account this regional variation in African American mtDNA sources? Does the database take into account this regional variation in African American mtDNA sources? NO NO Of ~1148 samples, approximately 800 are from Houston Of ~1148 samples, approximately 800 are from Houston The samples overall are convenience samples from blood banks, paternity-testing laboratories, laboratory personnel, clients in genetic-counseling centers, law-enforcement officers, and people charged with crimes (NRC II (1996) supra note 5, at 30), and are thus in no way randomized with regard to geographic origin or census data on the distribution of African Americans in the US. The samples overall are convenience samples from blood banks, paternity-testing laboratories, laboratory personnel, clients in genetic-counseling centers, law-enforcement officers, and people charged with crimes (NRC II (1996) supra note 5, at 30), and are thus in no way randomized with regard to geographic origin or census data on the distribution of African Americans in the US.

32 32 mtDNA database The other subsets of the database suffer from the same problem of non-random sampling also The other subsets of the database suffer from the same problem of non-random sampling also E.g. all of the Native American samples are Apache and Navajo. You could not pick a more non-representative tribal distribution if you tried, other than one composed entirely of Eskimo. E.g. all of the Native American samples are Apache and Navajo. You could not pick a more non-representative tribal distribution if you tried, other than one composed entirely of Eskimo.

33 33 Native American mtDNA variation

34 34 mtDNA database Asian subset is also not representative of the geographic origin of Asian Americans, based on the 2000 Census Asian subset is also not representative of the geographic origin of Asian Americans, based on the 2000 Census

35 35 But I thought they validated it? For the most part, validation of the various subsets of the mtDNA database involved confirming that the major haplogroups of mtDNA variation present in the source population were also present in their American relatives, sometimes in somewhat similar frequencies For the most part, validation of the various subsets of the mtDNA database involved confirming that the major haplogroups of mtDNA variation present in the source population were also present in their American relatives, sometimes in somewhat similar frequencies Ummmm. Whats a haplogroup? Ummmm. Whats a haplogroup?

36 36 Haplogroups? mtDNA lineages can be divided into major subgroups (haplogroups) based on shared ancestry resulting in shared sets of mutations mtDNA lineages can be divided into major subgroups (haplogroups) based on shared ancestry resulting in shared sets of mutations Within each haplogroup are individual haplotypes of unique combinations of mutations (this is what is utilized in the inclusion statistics) Within each haplogroup are individual haplotypes of unique combinations of mutations (this is what is utilized in the inclusion statistics)

37 37 Haplogroups?

38 38 Analogy Haplogroups are like last names Haplogroups are like last names Validating the database is basically like asserting that there are Smiths in Europe, and there are Smiths here too, so weve got a representative sample Validating the database is basically like asserting that there are Smiths in Europe, and there are Smiths here too, so weve got a representative sample BUT the FREQUENCY of Smiths will vary across the US, as well as across Europe, just as the frequencies of haplogroups do. So just asserting that there are Smiths in both places, and even showing that their AVERAGE frequencies are the same across the continents, doesnt tell you much about how representative your sample is BUT the FREQUENCY of Smiths will vary across the US, as well as across Europe, just as the frequencies of haplogroups do. So just asserting that there are Smiths in both places, and even showing that their AVERAGE frequencies are the same across the continents, doesnt tell you much about how representative your sample is To identify a person more uniquely, you need their whole name (their haplotype) To identify a person more uniquely, you need their whole name (their haplotype) The frequency of Wilmut G. Smith and Jesus A. Smith are going to be significantly different across the US (and across Europe) The frequency of Wilmut G. Smith and Jesus A. Smith are going to be significantly different across the US (and across Europe) Because the haplotype is what the inclusion statistic is based on, this is really the information we need Because the haplotype is what the inclusion statistic is based on, this is really the information we need

39 39 But doesnt the 95% confidence interval take all this into account? NO! NO! The calculation of a 95% CI ASSUMES that you have a random sample of the population, and that the population is not subdivided The calculation of a 95% CI ASSUMES that you have a random sample of the population, and that the population is not subdivided It is TOTALLY INVALID if your sample is not random, and/or the population is subdivided It is TOTALLY INVALID if your sample is not random, and/or the population is subdivided

40 40 Han Chinese Example (Yao et al. 2002) 263 unrelated individuals from 13 locations were typed for mtDNA haplotype, and sorted into haplogroups 263 unrelated individuals from 13 locations were typed for mtDNA haplotype, and sorted into haplogroups

41 41 Han Chinese Example (Yao et al. 2002)

42 42 Han Chinese Example (Yao et al. 2002) The comparison of the regional Han mtDNA samples revealed an obvious geographic differentiation in the Han Chinese….Hence, the grouping of different Han populations into just Southern Han and Northern Han…or the use of one or two Han regional populations to stand for all Han Chinese…does not appropriately reflect the genetic structure of the Han. Intriguingly, despite numerous historically recorded migrations and substantial gene flow across Chinese form the Bronze Age to the present time…differences between geographic regions have been maintained (p. 649). The comparison of the regional Han mtDNA samples revealed an obvious geographic differentiation in the Han Chinese….Hence, the grouping of different Han populations into just Southern Han and Northern Han…or the use of one or two Han regional populations to stand for all Han Chinese…does not appropriately reflect the genetic structure of the Han. Intriguingly, despite numerous historically recorded migrations and substantial gene flow across Chinese form the Bronze Age to the present time…differences between geographic regions have been maintained (p. 649).

43 43 mtDNA Database We simply dont have the data to examine the significance of phylogeographic substruction within the US populations at this point We simply dont have the data to examine the significance of phylogeographic substruction within the US populations at this point However, it is reasonable to assume that the same kind of substructure that exists in all studies of other populations that investigate haplotype phylogeography, and many that investigate haplogroup phylogeography, is also present in the US However, it is reasonable to assume that the same kind of substructure that exists in all studies of other populations that investigate haplotype phylogeography, and many that investigate haplogroup phylogeography, is also present in the US Thus, given the small sample size and non-random sampling strategy of the mtDNA database, it is unreasonable to assume it can provide meaningful estimates of sequence frequencies for the calculation of inclusion statistics. Thus, given the small sample size and non-random sampling strategy of the mtDNA database, it is unreasonable to assume it can provide meaningful estimates of sequence frequencies for the calculation of inclusion statistics. For more info on these issues, see Kaestle, FA, RA Kittles, AL Roth & EJ Ungvarsky (2006) Database Limitations on the Evidentiary Value of Forensic Mitochondrial DNA Evidence. Amer. Criminal Law Rev. 43: For more info on these issues, see Kaestle, FA, RA Kittles, AL Roth & EJ Ungvarsky (2006) Database Limitations on the Evidentiary Value of Forensic Mitochondrial DNA Evidence. Amer. Criminal Law Rev. 43:53-88.

44 44 Conclusions? Requirements for preventing and detecting contamination within ancient DNA research are generally more strict than in forensic applications Requirements for preventing and detecting contamination within ancient DNA research are generally more strict than in forensic applications Even with these requirements, contamination slips through Even with these requirements, contamination slips through The federal mtDNA database is currently inadequate for use in inclusion statistics calculations The federal mtDNA database is currently inadequate for use in inclusion statistics calculations


Download ppt "1 Reasonable Safeguards against Contamination in mtDNA Testing, And Some Database Issues Dr. Frederika Kaestle (Depts of Anthropology and Biology, IU Bloomington)"

Similar presentations


Ads by Google