Efficiently Querying Contradictory and Uncertain Genealogical Data Lars E. Olson and David W. Embley DEG Lab BYU Computer Science Dept. Supported by National.

Slides:



Advertisements
Similar presentations
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Advertisements

Enumerated data type & typedef. Enumerated Data Type An enumeration consists of a set of named integer constants. An enumeration type declaration gives.
ENUMERATED, typedef. ENUMERATED DATA TYPES An enumeration consists of a set of named integer constants. An enumeration type declaration gives the name.
Trig Equations Lesson Find the Angle for the Ratio  Given the equation  We seek the angle (the value of x) for which the cosine gives the ratio.
Line Efficiency     Percentage Month Today’s Date
Unit Number Oct 2011 Nov 2011 Dec 2011 Jan 2012 Feb 2012 Mar 2012 Apr 2012 May 2012 Jun 2012 Jul 2012 Aug 2012 Sep (3/4 Unit) 7 8 Units.
1 Automating the Extraction of Genealogical Information from the Web GeneTIQS Troy Walker & David W. Embley Family History Technology Conference March.
Querying Disjunctive Databases in Polynomial Time Lars Olson Masters Thesis Brigham Young University Supported by NSF Grant #
1 Converting Disjunctive Data to Disjunctive Graphs Lars Olson Data Extraction Group Funded by NSF.
Jyh-haw Yeh Dept. of Computer Science Boise State University
Allowing and Retrieving Constraint Violations in Data Storage Lars Olson DEG Lab BYU Computer Science Dept.
Results of Using an Efficient Algorithm to Query Disjunctive Genealogical Data Lars E. Olson and David W. Embley DEG Lab BYU Computer Science Dept. Supported.
Permitting Inconsistent Data in Data Storage Lars Olson DEG Lab BYU Computer Science Dept. Supported by National Science Foundation Grant #
1 Components of A Successful Data Warehouse Chris Wheaton, Co-Founder, Client Advocate.
Introduction to Solving Business Problems with MDX Robert Zare and Tom Conlon Program Managers Microsoft.
1 Introduction to databases concepts CCIS – IS department Level 4.
Time-Series Forecast Models EXAMPLE Monthly Sales ( in units ) Jan Feb Mar Apr May Jun Jul Aug Sept Oct Nov Dec Data Point or (observation) MGMT E-5070.
1 CPS216: Advanced Database Systems Notes 04: Operators for Data Access Shivnath Babu.
1 5 Normalization. 2 5 Database Design Give some body of data to be represented in a database, how do we decide on a suitable logical structure for that.
 Identify What You Know  Begin with personal records :  Gather information, using family group sheets and pedigree charts to organize what is known.
MA/CSSE 473 Day 28 Dynamic Programming Binomial Coefficients Warshall's algorithm Student questions?
Login Username: Password:. Welcome to office postal services! Summary Location1 Location2 Location3 Location4 Location5 Location6 Location7 Transit Time.
FlexTable: Using a Dynamic Relation Model to Store RDF Data IDS Lab. Seungseok Kang.
Chapter 6 Questions Quick Quiz
Class 3 Data and Business MIS 2000 Updated: Jan
MA/CSSE 473 Day 30 B Trees Dynamic Programming Binomial Coefficients Warshall's algorithm No in-class quiz today Student questions?
 What Qualifies Individuals for Temple Work :  Deceased for 1 year and 1 day  A date and a place for a vital event (birth, marriage, death, burial,
The Birthday Paradox July Definition 2 Birthday attacks are a class of brute-force techniques that target the cryptographic hash functions. The.
Distributed Control and Autonomous Systems Lab. Sang-Hyuk Yun and Hyo-Sung Ahn Distributed Control and Autonomous Systems Laboratory (DCASL ) Department.
Trig Equations Lesson Find the Angle for the Ratio  Given the equation  We seek the angle (the value of x) for which the cosine gives the ratio.
The 6 steps of data collection: 1. Make predictions 2. Write a questionnaire 3. Collect data (Data Collection Sheet) 4. Make results tables 5. Draw graphs.
Biology , at . the same time.
Jan 2016 Solar Lunar Data.
CPS216: Data-intensive Computing Systems
The 6 steps of data collection:
Components of A Successful Data Warehouse
Computing Full Disjunctions
A way to detect a collision…
Q1 Jan Feb Mar ENTER TEXT HERE Notes
Graphing in Science.
CIS 336 Competitive Success/snaptutorial.com
Average Monthly Temperature and Rainfall
Database.
2016 FIT FOR PERFORMANCE Weight Control Program
2017 Jan Sun Mon Tue Wed Thu Fri Sat
Gantt Chart Enter Year Here Activities Jan Feb Mar Apr May Jun Jul Aug
Q1 Q2 Q3 Q4 PRODUCT ROADMAP TITLE Roadmap Tagline MILESTONE MILESTONE

Step 3 Step 2 Step 1 Put your text here Put your text here
MONTH CYCLE BEGINS CYCLE ENDS DUE TO FINANCE JUL /2/2015
Jan Sun Mon Tue Wed Thu Fri Sat
The Birthday Paradox June 2012.
©G Dear 2008 – Not to be sold/Free to use
Electricity Cost and Use – FY 2016 and FY 2017
Independent Variables
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Q1 Q2 Q3 Q4 PRODUCT ROADMAP TITLE Roadmap Tagline MILESTONE MILESTONE
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Objective - To make a line graph.
Tables, Graphs, and Connections
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Text for section 1 1 Text for section 2 2 Text for section 3 3
Follow the instructions to make a picture.
Q1 Q2 Q3 Q4 PRODUCT ROADMAP TITLE Roadmap Tagline MILESTONE MILESTONE
Presentation transcript:

Efficiently Querying Contradictory and Uncertain Genealogical Data Lars E. Olson and David W. Embley DEG Lab BYU Computer Science Dept. Supported by National Science Foundation Grant #

2 Introduction Integrating data from multiple sources Some data just doesn’t fit the data model –Multiple data sources conflicting data –Uncertain or imprecise data –Data that violates constraints Sometimes it’s not possible to resolve the data PAF / Gedcom

3 Disjunctive Databases “OR-tables,” Imielinski and Vadaparty, 1989 NameBirth DateMarriage DateDeath Date James IDec Feb Feb Feb Feb Joseph Harrison 26 Jan Jan Jul Dec Apr

4 Shortcomings of “OR-tables” Can’t correlate between possible values Answering queries in general is CoNP- complete (Imielinski & Vadaparty) First NameSurnameBirth Place Priscilla Purcell Loveridge Cambridge Oxford

5 Sub-relation Data Construct Solution: store the correlated data in its own relation First NameSurnameBirth Place Priscilla SurnameBirth Place PurcellCambridge LoveridgeOxford

6 Disjunctive Database Problems How do we avoid the CoNP-completeness problem and answer queries efficiently? If more than one value is possible, which one is the most likely? Other questions to be solved: –Where are the constraint violations? –How do we map sub-relations to physical storage? –How do we efficiently update the database?

7 Transitive Closure of Disjunctive Graphs Solving the CoNP-completeness problem [LYY95] Disjunctive graphPossible interpretation Transitive closure of a: {a, d, e} a b c d e f a b c d e f

8 Using Disjunctive Graphs to Answer Queries ID#NameBirth Date Birth Place ID# (references Table Place) Marriage Date 1John Doe 12 Mar or 12 Mar or 2 15 Jun or 16 Jun Table Person: ID#CityState 1 Commerce or Nauvoo Illinois 2QuincyIllinois Table Place:

9 Using Disjunctive Graphs to Answer Queries Person Place John Doe 12 Mar Mar Jun Jun Nauvoo Commerce Illinois Quincy City State City Birth Place Birth Date Marriage Date Name ID#  State (  ID=1 Person Place)

10 Using Disjunctive Graphs to Answer Queries  City,State (  ID=1 Person Place) City Birth Place Person Place Nauvoo Commerce Illinois Quincy State City ID# –Definitely known? –All possible values? –Most likely value? …meaning what? ID# Birth Place City

11 Using Disjunctive Graphs to Answer Queries Birth Place City –Definitely known? –All possible values? –Most likely value? Person Place Nauvoo Commerce Illinois Quincy State City ID# …meaning what? Greedy Algorithm solution  City,State (  ID=1 Person Place)

12 Using Disjunctive Graphs to Answer Queries  P1.Name, P2.Name (Person P1 P1.BirthDate = P2.BirthDate Person P2) Person P1 ID #1 John Doe 12 Mar Mar 1840 ID #2 James Doe Person P2 12 Mar 1841

13 Limiting the Search Space In genealogy, most disjunctions are mutually independent Disjunctions that aren’t independent are limited to immediate family relations Build a relation containing all immediate family members (Person P1 P1.parent = P2.ID Person P2 P2.ID = P3.parent Person P3)

14 Limiting the Search Space Example constraints: –Each parent should be born before their children –Each child should be born at least 9 months apart (except multiple births) Person P1 ID #1 ID #2 ID #3 ID #4 Person P2 ID #1 ID #2 Person P3 ID #3 ID #4 ID #3 ID #4 1.0 ID #1 ID #2 parentchild = parent

15 Conclusions Genealogical data can be stored in a disjunctive database format. Many common queries can be computed in polynomial time. We can detect intractable queries and limit the search space required, usually enough to get polynomial time.