Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combinatorial Reconstruction of Sibling Relationships in Absence of Parental Data Tanya Y Berger-Wolf (DIMACS and UIC CS) Bhaskar DasGupta (UIC CS) Wanpracha.

Similar presentations


Presentation on theme: "Combinatorial Reconstruction of Sibling Relationships in Absence of Parental Data Tanya Y Berger-Wolf (DIMACS and UIC CS) Bhaskar DasGupta (UIC CS) Wanpracha."— Presentation transcript:

1 Combinatorial Reconstruction of Sibling Relationships in Absence of Parental Data Tanya Y Berger-Wolf (DIMACS and UIC CS) Bhaskar DasGupta (UIC CS) Wanpracha Chaovalitwongse (DIMACS and Rutgers IE) Mary Ashley (UIC Biology) Brothers! ? ?

2 The Problem Sibling Groups: 2, 3, 4, 5 2, 3, 4, 6 1, 7, 8 AnimalLocus 1Locus 2 allelel1/allele2 1149/167243/255 2149/155245/267 3149/177245/283 4155/155253/253 5149/155245/267 6149/155245/277 7149/151251/255 8149/173255/255

3 Why Reconstruct Sibling Relationships? Used in: conservation biology, animal management, molecular ecology, genetic epidemiology Necessary for: estimating heritability of quantitative characters, characterizing mating systems and fitness. But: hard to sample parent/offspring pairs. Sampling cohorts of juveniles is easier

4 Previous Work: Statistical estimate of pairwise distance and maximum likelihood clustering into family groups: (Blouin et al. 1996; Thomas and Hill 2002; Painter 1997; Smith et al. 2001; Wang 2004) Graph clustering algorithms to form groups from pairwise likelihood distance graph: (Beyer and May, 2003) Use 4-allele Mendelian constraint and brute force find groups (non-optimal) that satisfy it: (Almudevar and Field, 1999)

5 Our Approach: Mendelian Constrains 4-allele rule: a group of siblings can have no more than 4 different alleles in any given locus 155/155, 149/155, 149/151, 149/173 2-allele rule: let a be the number of distinct alleles present in a given locus and R be the number of distinct alleles that either appear with three different alleles in this locus or are homozygous. Then a group of siblings must satisfy a + R ≤ 4 155/155, 149/155, 149/151

6 Our Algorithm—Template: 1.Construct possible sets S 1, S 2, …, S m that satisfy 2-allele (weaker 4-allele) rule 2.For each individual x find its set S j 3.Find minimum set cover from sets S 1, S 2, …, S m of all the individuals. Return sets in the cover as sibling groups

7 Aside: Minimum Set Cover Given: universe U = {1, 2, …, n} collection of sets S = {S 1, S 2,…,S m } where S i subset of U Find:the smallest number of sets in S whose union is the universe U Minimal Set Cover is NP-hard (1+ln n)-approximable (sharp)

8 Our Algorithm—2-allele: 1.Construct possible sets S 1, S 2, …, S m that satisfy 2-allele rule: for each locus independently create all sets that satisfy a+R ≤ 4, combine loci 2.(all the individuals are already assigned to sets from step 1) 3.Find minimum set cover from sets S 1, S 2, …, S m of all the individuals. Return sets in the cover as sibling groups

9 Our Algorithm—4-allele: 1.Construct possible sets S 1, S 2, …, S m that satisfy 4-allele rule (must exist since each pair of individuals forms a valid set) loc1loc2 ind11/12/3set(1,2) = {1,4}{2,3,5,6} ind21/45/6 2.For each individual x add it to S j only if it its alleles for each locus are in the set of alleles for that locus in S j 3.Find minimum set cover from sets S 1, S 2, …, S m of all the individuals. Return sets in the cover as sibling groups

10 Experimental Protocol: Create females and males, randomly pair them into couples, produce offspring, giving each juvenile one of each parent’s allele in each locus randomly. The parameter ranges for the study : Number of adult females F = 10, males M = 10 Number of loci sampled l = 2; 4; 6; 10 Num of alleles per locus a = 2; 5; 10; 20 Factor of the number of juveniles as the number of females j = 1; 2; 5; 10 Max number of offspring per couple o = 2; 5; 10; 30; 50

11 Algorithm Evaluation: 1.Use 4-allele algorithm on simulated juvenile population (using CPLEX 9.0 MIP solver to optimally solve Min Set Cover). 2.Compare results to the true known sibling groups. 3.Evaluate accuracy using a generalization of Gusfields’s partition distance (Information Proc. Letters, 2002)

12 Results As expected, the error increases as the number of juveniles increases

13 Results Surprisingly, and unlike any statistical and likelyhood method, the error does not depend on the number of loci and allele frequency

14 Results The error decreases as the number of true siblings increases. (When few siblings we underestimate number of sibling groups)

15 Conclusions Ours is a fully combinatorial method. Uses simple Mendelian constraints, no statistical estimates or a priori knowledge about data Even the very weak 4-allele constraint shows good trends (no dependence on number of loci sampled or allele frequency) Need to evaluate the 2-allele algorithm on simulated and real data and compare to other sibship reconstruction algorithms


Download ppt "Combinatorial Reconstruction of Sibling Relationships in Absence of Parental Data Tanya Y Berger-Wolf (DIMACS and UIC CS) Bhaskar DasGupta (UIC CS) Wanpracha."

Similar presentations


Ads by Google