Assigning individuals to ethnic groups based on 13 STR loci X. Fosella 1, F. Marroni 1, S. Manzoni 2, A. Verzeletti 2, F. De Ferrari 2, N. Cerri 2, S.

Slides:



Advertisements
Similar presentations
A quantitative trait locus not associated with cognitive ability in children: a failure to replicate Hill, L. et al.
Advertisements

Lab 3 : Exact tests and Measuring of Genetic Variation.
Lab 3 : Exact tests and Measuring Genetic Variation.
Forensic DNA Analysis (Part II)
Welcome to the World of Investigative Tasks
. Learning – EM in ABO locus Tutorial #08 © Ydo Wexler & Dan Geiger.
Estimating the penetrances of breast and ovarian cancer in the carriers of BRCA1/2 mutations Silvano Presciuttini University of Pisa, Italy.
Fundamentals of Forensic DNA Typing Slides prepared by John M. Butler June 2009 Appendix 3 Probability and Statistics.
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
IGES 2003 How many markers are necessary to infer correct familial relationships in follow-up studies? Silvano Presciuttini 1,3, Chiara Toni 2, Fabio Marroni.
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
Software Quality Control Methods. Introduction Quality control methods have received a world wide surge of interest within the past couple of decades.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
THE POWER OF AN IBS-BASED METHOD TO INFER RELATIONSHIPS BETWEEN PAIRS OF INDIVIDUALS Silvano Presciuttini 1, Chiara Toni 1, Simonetta Verdiani 2, Lucia.
Today Concepts underlying inferential statistics
Quantitative Genetics
Statistical hypothesis testing – Inferential statistics I.
Inferential Statistics
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Center of Statistical Genetics University of Pisa Traceability of cattle breeds by DNA analysis Silvano Presciuttini Limoges, 29 juin 2007.
Statistics & Biology Shelly’s Super Happy Fun Times February 7, 2012 Will Herrick.
Introduction to Statistical Inference Chapter 11 Announcement: Read chapter 12 to page 299.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
1 Comparison of Several Multivariate Means Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute.
Chapter 7 Population Genetics. Introduction Genes act on individuals and flow through families. The forces that determine gene frequencies act at the.
A Statistical Analysis of Seedlings Planted in the Encampment Forest Association By: Tony Nixon.
Human Identity Testing Purpose: Match a person to a DNA sample. Examples: Paternity Test Genetic History Historical (Thomas Jefferson, Sally Hemings) Genealogical.
1 Projecting Race and Hispanic Origin for the U.S. Population and an Examination of the Impact of Net International Migration David G. Waddington Victoria.
Objective 1 - Describe how restriction enzymes are used to manipulate DNA. Run the virtual gel electrophoresis at this web site
11 Comparison of Several Multivariate Means Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
Copyright © Cengage Learning. All rights reserved. 8 Introduction to Statistical Inferences.
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Quantitative Genetics
Speaker: Bin-Shenq Ho Dec. 19, 2011
Chapter 9 Three Tests of Significance Winston Jackson and Norine Verberg Methods: Doing Social Research, 4e.
Chapter 3 should describe what will be done to answer the research question(s), describe how it will be done and justify the research design, and explain.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 10 Comparing Two Populations or Groups 10.1.
Lecture 14: Population Assignment and Individual Identity October 8, 2015.
Organization of statistical research. The role of Biostatisticians Biostatisticians play essential roles in designing studies, analyzing data and.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Course Overview Collecting Data Exploring Data Probability Intro. Inference Comparing Variables Relationships between Variables Means/Variances Proportions.
Individual Identity and Population Assignment Lab. 8 Date: 10/17/2012.
CHAPTER 10 Comparing Two Populations or Groups
Chapter 9: Inferences Involving One Population
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 10 Comparing Two Populations or Groups
Migrant Studies Migrant Studies: vary environment, keep genetics constant: Evaluate incidence of disorder among ethnically-similar individuals living.
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 11 Inference for Distributions of Categorical Data
Statistical Analysis and Design of Experiments for Large Data Sets
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 11 Inference for Distributions of Categorical Data
Linkage Analysis Problems
Genotyping Results Each person was typed for 3 unlinked Short Tandem Repeat loci (STR) vWFII – chromosome 12, intron 40 of the vWF gene UT 2203 – chromosome.
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 10 Comparing Two Populations or Groups
CHAPTER 11 Inference for Distributions of Categorical Data
CHAPTER 10 Comparing Two Populations or Groups
Presentation transcript:

Assigning individuals to ethnic groups based on 13 STR loci X. Fosella 1, F. Marroni 1, S. Manzoni 2, A. Verzeletti 2, F. De Ferrari 2, N. Cerri 2, S. Presciuttini 1 1 Center of Statistical Genetics, University of Pisa, Pisa, Italy 2 Institute of Legal Medicine, University of Brescia, Brescia, Italy

Introduction  Inferring individual’s ethnic origin by means of short tandem repeat (STR) profiles has recently received considerable attention in several fields of applied genetics  Gene frequency variation among human populations has been extensively documented, and it has been suggested that the differences in allele proportions between ethnic groups could form the basis of an inferential system  We selected several groups of ethnically defined immigrants in a province of Northern Italy and assessed the power of 13 STR of tracing back their origin

Backgrounds of this study  Brescia ranks among the highest towns in Italy for the proportion of censused immigrants from non-EU countries (about 10% of the local resident population)  Most blood crimes in this area happens within ethnically defined groups  A test that could assign a biological stain to a subject of a particular population would be highly welcomed

Population samples Unrelated individuals from the historical series of the Institute of Legal Medicine of the University of Brescia, including a large number of immigrants from a variety of countries, were selected for the present analysis Unrelated individuals from the historical series of the Institute of Legal Medicine of the University of Brescia, including a large number of immigrants from a variety of countries, were selected for the present analysis In addition, blood samples from subjects of known ethnicity were collected from the local hospital, following IRB approval In addition, blood samples from subjects of known ethnicity were collected from the local hospital, following IRB approval The composition of the final sample is shown below The composition of the final sample is shown below Subjects were typed with the Profiler Plus TM and SGM Plus TM kits (totaling 13 loci), following standard protocols Subjects were typed with the Profiler Plus TM and SGM Plus TM kits (totaling 13 loci), following standard protocols

Population assignment test (1) Theoretically, assigning individuals to a given population based on a multilocus genotype is surprisingly easy.

Population assignment test (2) A computational trouble (a division by zero) arises when a particular allele is absent from a population sample; in this case, an arbitrary frequency must be given to that allele in that population.

A popular program in genetic data analysis The software Arlequin ( arlequin/) includes a routine for computing the likelihood of each multilocus genotype of a defined population sample in each of a number of specified samples. We first applied Arlequin’s calculation to all our data; however, the assumption made by the program about the “missing” alleles is not clear. Therefore, we analyzed in more detail the comparison Blacks/Italians using an explicit and conservative assumption.

Arlequin results These charts show the individual log- likelihoods of four samples of immigrants plotted against the log- likelihood of Italians being falsely attributed to each of these samples

A closer look at the comparison Blacks vs Italians The analysis was repeated for these two samples with our algorithm, using the allele frequencies estimated from all available data (about 100 subjects typed for each locus in both samples), and assigning a value of 1/200 (=0.005) to the frequency of the missing alleles

Simulated samples In order to estimate the statistical power of the assignment test with higher accuracy, we simulated 10,000 individuals for both the Italian and the Black samples

Statistical power of discriminating between the two simulated samples (Blacks vs Italians)

Conclusions (1) The 13 STR loci included in two commercial kits provided a limited but significant power to infer the ethnicity of immigrant groups. The 13 STR loci included in two commercial kits provided a limited but significant power to infer the ethnicity of immigrant groups. Not surprisingly, the highest level of discrimination was achieved by contrasting Blacks with resident Whites. Not surprisingly, the highest level of discrimination was achieved by contrasting Blacks with resident Whites. When two alternative hypotheses about the ethnic origin of a sample can be formulated with confidence, a population assignment test can already be applied to real cases When two alternative hypotheses about the ethnic origin of a sample can be formulated with confidence, a population assignment test can already be applied to real cases

Conclusions (2) The present work suggests that a sequential test can be highly efficient in assigning individuals to their ethnic group; given a fixed pre-defined significance level, investigators can start typing a small number of markers, and typing additional marker sets only for the subjects that failed to be classified at the previous step. The present work suggests that a sequential test can be highly efficient in assigning individuals to their ethnic group; given a fixed pre-defined significance level, investigators can start typing a small number of markers, and typing additional marker sets only for the subjects that failed to be classified at the previous step.