Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Rules of Time: Data Quality Issues for Time Varying Databases DAMA National Capital Region – Mar 2002 Dr. Jerry Rosenbaum ConcentrX, LLC 410-764-1843.

Similar presentations

Presentation on theme: "The Rules of Time: Data Quality Issues for Time Varying Databases DAMA National Capital Region – Mar 2002 Dr. Jerry Rosenbaum ConcentrX, LLC 410-764-1843."— Presentation transcript:

1 The Rules of Time: Data Quality Issues for Time Varying Databases DAMA National Capital Region – Mar 2002 Dr. Jerry Rosenbaum ConcentrX, LLC voice mobile fax

2 2 Sept 2001Concentrx, LLC Outline Perspective Example Aspects of Time Example (with LDM) Queries Design Guidelines

3 3 Sept 2001Concentrx, LLC Perspective Designing building and using a time dependent database looks simple –Just add in dates or date ranges to some tables –Rows are only logically deleted (to maintain history) –Make sure the SQL includes date logic BUT...

4 4 Sept 2001Concentrx, LLC Perspective continued There are often many issues and a lot of complexity lurking under the covers –You must understand the requirements –You must understand the uses of the data –You must be prepared help the ad hoc customer obtain valid results Ferret out what the customer forgot to tell you Understand what they are really saying Transaction Path Analysis is very useful for physical design

5 5 Sept 2001Concentrx, LLC Perspective continued Primary Keys often have a time factor Queries must take into account the (multiple) times and / or time ranges Relationships between entities tend to become more complex The notion of referential integrity may need to change Training customers is difficult Training developers is no easier

6 6 Sept 2001Concentrx, LLC Simple Questions How will we represent date – ymd, mdy, dmy, yd, day count since a start date Which Calendar –Julian, Gregorian, Hebrew, Chinese, Muslim, Hindu When does a day begin –Just after midnight (local time) –At sunset (local time) How about am/pm vs. 24 hour clock How does daylight savings time fit in What are the transformation rules between them

7 7 Sept 2001Concentrx, LLC Example - Persons Residence Track every residence a person has lived in and when they resided at each place Basic table design includes –Name –Address –Start date –End date

8 8 Sept 2001Concentrx, LLC Some Issues However, we are not yet done –We must understand the business purpose for tracking the data –We must understand how the data may be used –We must uncover and handle possible quirks in the data –Are other attributes needed –How should we handle the primary key

9 9 Sept 2001Concentrx, LLC Example of Business Issue How do we plan to use the address –General mailings –Bills –Time sensitive material (e.g. auction catalog) –Visit the person –Call the person –Aggregate reporting –Etc, Etc, Etc

10 10 Sept 2001Concentrx, LLC Questions Is day sufficiently granular What if the person lives in Bombay, India and the user lives in NYC –What do we do about the 12 hour time zone difference, especially if it bridges days. –For this type of application we can probably ignore the time zone (unless we wish to call the person)

11 11 Sept 2001Concentrx, LLC More Questions Can there be a time with no residence Can a person have more than one residence at one time –Is one residence primary and other secondary –Can we have a temporary overlap of times as the person moves residences –How about winter and summer residences with each primary in its season –Should a temporary residence be included –Can one buy two residences on one day

12 12 Sept 2001Concentrx, LLC Primary Key Questions Does it make sense to use –Name + date + address sequence number –Name + address sequence number –Surrogate key If a surrogate key is used, what is the underlying business key What affect does this have on foreign keys

13 13 Sept 2001Concentrx, LLC Possible Design So far we are led to the below possible design –Surrogate Key –Name –Address –Address Type –Start Date –End Date

14 14 Sept 2001Concentrx, LLC Yet More Questions Do we have to track –When we knew about a new address –When we knew that an address is to end Note that these two dates can be –Before the person moved to an address –During the time a person is at an address –After a person leaves an address This data would add two more dates to the table design

15 15 Sept 2001Concentrx, LLC One Last Thought Alternative physical design could be 2 tables Table 1 –Person Id –Name Table 2 –Person Id –Address Seq Number –Rest of the attributes

16 16 Sept 2001Concentrx, LLC Key Points The basics of tracking time varying data appear easy The details cannot be ignored because they will cause changes in both the design and use of the database One must understand the business One must understand the customers Rules are subject to change

17 17 Sept 2001Concentrx, LLC Aspects of Time Degree of time dependency and vary from table to table and attribute to attribute Some data has no time dependency (or we dont care about the time dependency) Some data is time annotated Other data is valid only for a specified time or time period (I.e. time period dependent)

18 18 Sept 2001Concentrx, LLC Time Data Types Time Points Time Periods Time Period Categories Bounded Time Periods

19 19 Sept 2001Concentrx, LLC Events and Time Time by itself is rarely of interest Events and Things are important and we may need to track time in relation to them An event or thing may have one or more time factors associated with it that are relevant to the business Time factors may be interdependent

20 20 Sept 2001Concentrx, LLC Time Points Refers to a single moment in time Examples –The time that an event happened –The time we found out that the event happened –The time the data about the event was entered into the system Any single event may have multiple point in time dimensions

21 21 Sept 2001Concentrx, LLC Picking a Point in Time Suppose a widget is imported by Ship What is the import date –Date widget is loaded onto the ship –Date ship arrives in U.S. port –Date container is taken off ship –Date customs inspector gets manifest –Date custome inspector verifies manifest –Etc. Etc, Etc If widgets are subject to a quota this is very important

22 22 Sept 2001Concentrx, LLC Time Periods Has a duration - beginning time point and end time point Examples –U.S. government fiscal year 1999 (Oct 1, 1998 to Sept 30, 1999) –Effective and Expiration dates of an insurance policy An event may have multiple time periods associated with it

23 23 Sept 2001Concentrx, LLC Time Point Categories Generalization of a Time Point Examples –Last day of Month (Jan 31, Feb 28, etc) –New Moon –Mondays Categories must be well defined and data may be entered or calculated for each entry Example of use - service customer first Monday of each month

24 24 Sept 2001Concentrx, LLC Time Period Categories Generalization of Time Periods Examples –Fiscal Year –Accounting Months –Sales weeks for a retailer (often is Mon - Sun and numbered sequentially from first full week in January) Example of use - comparing retail sales from last year and this year

25 25 Sept 2001Concentrx, LLC Bounded Time Periods Similar to time periods, but the span of time is not predefined Examples –The period when a person works for a company (or department) –Car ownership - day you acquire a car until the day you dispose of it

26 26 Sept 2001Concentrx, LLC Tense Time factors can be –Past –Present –Future There are often business rules about recording past and future information as well as rules for changing that data

27 27 Sept 2001Concentrx, LLC The Global Aspect Many companies operate in multiple time zones (including global operations) To correlate time factors between different time zones generally sets up –Reference time zone –Rules for recording local time zones (or location)

28 28 Sept 2001Concentrx, LLC Example – Tracking Employees We need track some HR data and maintain history –Employees Hours worked each day Salary Paychecks –Departments Departmental Manager Employees working in the department

29 29 Sept 2001Concentrx, LLC Business Question Determine the number of hours a person worked during the week of January 1, 1998 (Thursday) If a work day includes midnight, we attribute all hours to the day in which the work period began –Note: Midnight is the beginning of the next day

30 30 Sept 2001Concentrx, LLC Additional Questions When does a work week start: Friday, Saturday, Sunday or Monday The week of Jan 1 goes across a calendar year boundary, do we split the week into two Are there two types of weeks: tax weeks and work weeks. We use tax week for the IRS and work week to calculate payroll Payroll withholding rules change every year

31 31 Sept 2001Concentrx, LLC Logical Design Building the logical data model Include time independent and time annotated items Temporarily ignore time dependencies and treat the model as if you were looking at the business at a specific point in time. Add time dependencies as a second step

32 32 Sept 2001Concentrx, LLC LDM Without time dependencies

33 33 Sept 2001Concentrx, LLC Add Some Time Dependencies Employees –Have hire and termination dates –Change salary –Change departments Departments –Are created and eliminated –Have changes in management

34 34 Sept 2001Concentrx, LLC LDM With Time Dependencies

35 35 Sept 2001Concentrx, LLC Notes Primary keys for tables (except PayCheck) include a start time All time periods include both a start time and an end time If we do not know the end time, should we –Use a standard default value (preferred) –Use a null This is the normalized logical model

36 36 Sept 2001Concentrx, LLC More Notes The LDM maintains RI –Physical model will generally not have RI The business rules for integrity of the data (similar to RI) are critical –The basic business key portions must match –The time period of the referenced table must include the time period of the referencing table

37 37 Sept 2001Concentrx, LLC Still More notes PayCheck is still the same except there is an important business rule The attribute salary has become a separate table with a 1:M relationship The 1:1 manages a dept relationship became a M:M relationship The 1:M member of a dept relationship became a M:M relationship

38 38 Sept 2001Concentrx, LLC Looking At Queries Consider the following tables EmployeeEmp Id NameStart DtEnd Dt 001 Smith Jones Member OfEmp Id DeptFrom DtTo Dt 001 Acct Finance Finance SalaryEmp Id SalaryFrom DtTo Dt

39 39 Sept 2001Concentrx, LLC Average at a Point in Time Average salary for finance at the end of 1999 Select Average (T3 Salary) From Member Of T2 Salary T3 Where T2.Dept = Fin And T2.EmpId = T3.EmpId And Between T2.Dt From and T2.Date To And Between T3.Date From and T3.Date To We have a similar query for Average 1998 salary

40 40 Sept 2001Concentrx, LLC Are We Comparing the Right Averages People have changed departments Average salary at the end of 1998 and 1999reflects those people who just happened to be in Finance at those points in time The average salary in Finance dropped because we transferred in a low salary employee We must create views that take into account the organizational changes

41 41 Sept 2001Concentrx, LLC Yesterdays Salary with Todays Glasses What is the average salary for Finance at the end of 1998 based on those in finance at the end of 1999 Select Avg (T2.Salary) FromMember Of T2 Salary T3 Where T2.Dept = Fin AndT2.EmpId = T3.EmpId And Between T2.Dt From And T2.Dt To And Between T3.Dt From And T3.Dt To

42 42 Sept 2001Concentrx, LLC Query Notes Queries that involve one time point (or period) are usually straight forward Queries involving 2 time point (or period) can cause significant confusion Watch out for a set of 2 (or more) queries which involve more than one time point (or period). They look deceptively simple

43 43 Sept 2001Concentrx, LLC Design Guidelines First build the logical data model –Include time independent and time annotated items (e.g. date of birth) –Temporarily ignore time dependencies and treat the model as if you were looking at the business at a specific point in time. –Make notes about all time dependent attributes and entities and relationships Gather potential queries, query sets and reports

44 44 Sept 2001Concentrx, LLC Design - 2 The LDM should be in Third Normal Form The primary keys in this model will be the basic business keys for future integrity rules Do not combine entities in 1:1 relationships unless they truly represent the same thing or concept In general column vectors are preferred to row vectors Delay design changes for physical considerations until later

45 45 Sept 2001Concentrx, LLC Adding in the Time Factors Individual Attributes Groups of Attributes 1:1 Relationships 1:M Relationships M:M Relationships N-ary relationships Integrity Rules Multiple Time Factor Case

46 46 Sept 2001Concentrx, LLC Integrity Rules Referential Integrity often does not hold in the physical database design –There is no exact matching of primary and foreign keys Business integrity rule usually replaces RI –An exact match of the business key (like RI) –Rule for how the time factors must relate to each other

47 47 Sept 2001Concentrx, LLC Hard Problem The design of the database is the easy problem Training customers to properly understand and use a time varying database is hard and you should not underestimate the task.

48 48 Sept 2001Concentrx, LLC Thank you for your patience Questions Dr. Jerry Rosenbaum ConcentrX, LLC voice mobile fax

Download ppt "The Rules of Time: Data Quality Issues for Time Varying Databases DAMA National Capital Region – Mar 2002 Dr. Jerry Rosenbaum ConcentrX, LLC 410-764-1843."

Similar presentations

Ads by Google