What is necessary (and unnecessary) for analyses of offender databases Forensic Bioinformatics ( Jason R. Gilder August 16, 2008
Offender databases Originally designed for convicted offenders –CODIS: Convicted Offender DNA Index System Expanded –Unsolved crime samples –Arrestees –Elimination profiles
CODIS COmbined DNA Index System –National: NDIS –State: SDIS - fewer restrictions –Local: LDIS - fewest restrictions Convicted Offender Profiles in NDIS: 6,031,000 Forensic Profiles in NDIS: 225,400 More than 71,800 cold hits
Why analyze a database? Questions remain regarding the weight of a DNA database match –Random Match Probability (RMP) –Database Match Probability (DMP) –Balding & Donnelly LR –Other Composition of database may affect chance of a coincidental match –Presence of relatives
Structure of a DNA database Collection of records Structured Query Language (SQL) format ID#FnameLnamePopSSNDateD3vWAFGA…D7 AC937JohnDoeCAU /2/0213, 1516, 1621, 2311, 14 BQ384JaneDoeHIS /23/0312, 1715, 1925, 2510, 10 BZ927FrankSmithAA /9/0613, 1514, 1524, 2612, 16
Examples of possible issues with the use of DNA databases Michigan v. Gary Leiterman –Evidence: blood found on victims hand –Cold hit to a 4-year-old boy R v. Sean Hoey –Evidence: explosive device –Cold hit to a 14-year-old boy Jaidyn Leskie inquest (Australia) –Evidence: clothing from deceased –Cold hit to a rape victim
Lab error and false cold hits
How a database can be analyzed Perform all pairwise profile comparisons –the Arizona Search P 1 with P 2, P 1 with P 3, P 1 with P 4, …, P 1 with P n P 2 with P 3, P 2 with P 4, P 2 with P 5, …, P 2 with P n Analyze profile similarity –Count number of matching loci and alleles –Perform kinship analyses
Arizona Match Data 65,493 Profiles –122 pairs matched at 9 of 13 loci –20 pairs matched at 10 of 13 –1 pair matched at 11 of 13 –1 pair matched at 12 of 13 LociAveStd Devp-value E E E-04
Review of Victoria State Database Krane/Paoletti analysis: >11,000 profiles each compared to all others across 9 loci: Shared allelesObserved occurrences Aussie Bump
# Matching Alleles # Observed
Issues with the release or analysis of a DNA database Privacy concerns –Names, social security numbers, DNA profiles, addresses, etc. Issues with analysis –Duplicate profiles, multiple databases, presence of relatives, processing time, CODIS requirements Legal issues –California Proposition 69
Issue 1: Privacy concerns Database contains private information that should not be released Answer: provide anonymous profiles only Accomplished through one command SELECT D3, vWA, FGA, …, D7 FROM CODIS_DB
Issue 2: Duplicate profiles Many databases contain at least 10-15% duplicate profiles Answer: ignore duplicates in analysis A fairly thorough database analysis can take place with duplicates removed –Also identify potential mistyping rate The lab may be able to cull out duplicates from the same individual with additional information (e.g. SSN)
Issue 2b: Multiple databases California DOJ contains information in two databases that can be cross referenced to remove duplicates –Login DB – contains unique CII ID and accession numbers of all samples for that individual –SDIS – contains accession number and profile Answer: JOIN the data with one command –Only select the first accession number profile SELECT D3, vWA, FGA, … D7 FROM SDIS JOIN LOGIN_DB WHERE (LOGIN_DB.ACCESSION1 = SDIS.ACCESSION)
Issue 3: Presence of relatives It is difficult to identify the presence of relatives by hand by simply looking at the CODIS records There are a significant, but unknown number, of such related individuals in Californias offender database. – Kenneth Konzack Answer: Exactly!
Issue 4: Processing time Performing an internal search of the database will take too long (a week or more) and will not allow for CODIS searches during that time Answer: perform an analysis on a separate computer or computers Pairwise database search is embarrassingly parallel
Issue 5: Legal issues Legal statutes (e.g., California Proposition 69) prohibit release of database to citizens Answer: 38 state statutes (including CA) allow for an outside review of their database for statistical analysis –Many require the removal of identifying information
Questions?