2Dimensional Modeling Dimensional modeling Logical design technique for structuring dataIt is intuitive to business usersEasy-to-understandFast query performancePrimary constructs of a dimensional modelfact tablesdimension tables
3Star Schema A fact table Multiple dimension tables Example: Assume this schema to be of a retail-chain. Fact will be revenue (money). How do you want to see data is called a dimension.
4Facts Facts Measurements Numeric Additive Semi-additive Non-additive CriticalBI applications do not retrieve a single fact table row; data is summarizedSemi-additiveCannot be summed across time periodsExamples: account balances, inventory levelsNon-additiveCannot be summed across any dimensionAre stored in dimension tables
5Fact Tables Fact tables Conformed facts For non-conformed facts Store numeric additive factsConformed factsFacts with identical definitionsMay have same standardized name in separate tablesFor non-conformed factsDifferent interpretations must be given different names
6Fact Tables Fact table keys Complex key that consists of foreign keys from intersecting dimension tablesEvery foreign key must match a unique primary key in the corresponding dimension tableForeign keys should not be nullSpecial keys such as “unknown”, “N/A”, etc. should be used instead.
7Fact Tables Fact table granularity Data should be at the lowest, most detailed atomic grain captured by a business processFlexibility in querying/reportingScalability
8Dimension Tables Dimension tables Dimensions Consist of highly correlated groups of attributes that represent key objects in business such as products, customers, employees, facilitiesStore attributes forQuery constraining/filteringQuery result labelingDimensionsCan be easily identified when business users use “by” wordExample: by year, by product, by region, etc.
9Dimension Tables Dimension attributes Textual fields Numeric values that behave like textNon-additivesRequirementsLabels consist of full worldsDescriptiveNo missing valuesDiscretely valued (contain only 1 value for each row in the dimension table)Quality assured (no misspelling, obsolete or orphaned values, different versions of the same attribute)
10Dimension TablesDimension tables are small with regard to the number of rowsStoring descriptions for each attribute is criticalEasy-to-use for business usersRows are uniquely identified by a single key, usually, a sequential surrogate key
11Dimension Tables Advantages of using surrogate keys Performance Efficient joinssmaller indexesmore rows per blockData integrityWhen the keys in operational systems are reusedDiscontinued products, Deceased customers, etc.Mapping when integrating data from different sourcesKeys from different sources may be differentMapping table of the surrogate key and keys from different sources
12Dimension Tables Advantages of using surrogate keys (Cont) Handling unknown or N/A valuesEase of assignment a surrogate key value to rows with these valuesTracking changes in dimensional attribute valuesCreating new attributes and assigning the next available surrogate key
13Dimension Tables Disadvantages of using surrogate keys Assignment and management of surrogate keys and appropriate substitution of these keys for natural keys – extra load for ETL systemMany ETL tools have built-in capabilities to support surrogate key processingOnce the process is developed, it can be easily reused for other dimensions
14Conformed Dimensions a.k.a. master or common reference dimensions Shared across the DW environment joining to multiple fact tables representing various business processes2 typesIdentical dimensionsOne dimension being a subset of a more detailed dimension
15Conformed Dimensions Identical dimensions Same content, interpretation, and presentation regardless of the business process involvedSame keys, attribute names, attribute definitions, and domain values regardless of domain values they join toExample: product dimension referenced by orders and the one referenced by inventory are identicalOne dimension being a perfect subset of a more detailed, granular dimension tableSame attribute names, definitions, and domain valuesExample: sales is linked to a dimension table at the individual product level; sales forecast is linked at the brand level
16Conformed Dimensions Product Dimension Product key PK Product descriptionSKU numberBrand descriptionSub class descriptionClass descriptionDepartment descriptionColorsizeDisplay typeSales Fact TableDate key FKProduct key FK… other FKeys…Sales quantitySales amountSales Forecast Fact TableMonth key FKBrand key FK… other FKeys…Forecast quantityForecast amountBrand DimensionBrand key PKBrand descriptionSub class descriptionClass descriptionDepartment descriptionDisplay type
17Conformed Dimensions Benefits Consistency Integration Every fact table is filtered consistently and results are labeled consistentlyIntegrationUsers can create queries that drill across fact tables representing different processes individually and then join result set on common dimension attributesReduced development time to marketOnce created, conform dimensions are reused
18Dimensional Design Process Based on business requirements and data realitiesStep 1 – choose the business processStep 2 – declare the grainStep 3 – identify dimensionsStep 4 – Identify facts
19Enterprise Bus Architecture Requirements are gathered and represented in a form of Enterprise Data Warehouse Bus MatrixEach row corresponds to a business/processEach column corresponds to a dimension of the businessEach column is a conformed dimensionEnterprise Data Warehouse Bus Matrix documents the overall data architecture for DW/BI system
21Enterprise Bus Architecture Matrix Possible Problems:Level of details for each column and row in the matrixRow-relatedListing departments/imitating organizational chart instead of business processesListing reports and analytics related to business process instead of the business process itselfEx. Shipping orders business process supports various analytics such as customer ranking, sales rep performance, product movement analyses
22Enterprise Bus Architecture Matrix Possible Problems (Cont):Column-relatedGeneralized columns/dimensionsExample: “Entity” column is too general as it includes employees, suppliers, contractors, vendors, customersToo many columns related to the same dimensionWorst case when each attribute is listed separatelyExample: Product, Product Group, LOB are all related to the Product dimension and should be listed as one.
23Date/Time Dimensions Standard date dimension table at a daily grain Rationale: remove association with calendar from BI applicationsUse numeric surrogate keys for date dimension tablesDate DimensionDate key pkCalendar DateCalendar MonthCalendar DayCalendar QuarterCalendar Half yearCalendar YearFiscal QuarterFiscal Year…
24Date/Time DimensionsTime of day should be treated as dimension only if there are meaningful textual descriptions for periods within the dayExample; lunch hour, rush hours, etc.Otherwise, time of day needs to be represented as a simple non-additive fact or a date/timestamp
25Date/TimestampUsed in the fact table to support precise time interval calculated across fact rowsCalculations to be performed by ETL systemExample: elapsed time between original claim date and first payment date
26Multiple Time Zones Express time in coordinated universal time (UTC) Additionally, may be expressed in local timeOther options: use a single time zone (for example, ET) to express all times in this zonelocal call datedimensionCall Center Activity FactLocal call date key FKUTC call date key FKLocal call time of day fkUTC call time of day fk…Local call time ofday dimensionUTC call datedimensionUTC call time ofday dimension
27Degenerate Dimensions Occur in transaction fact tables that have a natural parent-child structureKey remains the only attribute left after other attributes got separated into dimensionsKey should be the actual transaction numberStored in a fact table - do not create a corresponding dimension table
28Degenerate Dimensions Example:DIM CUSTOMERCustomer keycustomer idcustomer lnamecustomer fnameORDERS TRANSACTIONSorder#customer idcustomer lnamecustomer fnameshipto street addressshipto cityshipto stateshipto ziporder total amountdiscount amountnet order amountpayment amountorder dateORDERS FACTScustomer keyshipto address keyorder date keyorder total amountdiscount amountnet order amountpayment amountorder#DIM SHIPTO ADDRESSShipto address keyshipto street addressshipto cityshipto stateshipto zipDIM Order DateOrder date keyCalendar dateCalendar month…
29Slowly Changing Dimensions Dimension table attributes change infrequentlyMini-dimensionsSeparating more frequently changing attributes into their own separate dimension table, a.k.a. mini-dimension3 types of handling slowly changing dimensionsOverwrite the dimension attributeAdd a new dimension rowAdd a new dimension attribute
30Slowly Changing Dimensions - Overwrite the dimension attribute New values overwrite old onesNo history is keptProblems occur if data was previously aggregated based on old valuesWill not match ad-hoc aggregations based on new valuesPrevious aggregations need to be updated to keep aggregated data in-sync.
31Slowly Changing Dimensions - Add a new dimension row Most popular techniqueNew row with new surrogate PK is inserted into dimension table to reflect new attribute valuesBoth, old and new values are stored along with effective and expiration dates, and the current row indicatorExample:
32Slowly Changing Dimensions - Add a new dimension attribute Used infrequentlyA new column is added to the dimension tableOld value is recorded in a “prior” attribute columnNew value is recorded in the existing columnAll BI applications transparently use the new attributeQueries can be written to access values stored in the “prior“ attribute column
33Role-playing Dimensions Same physical dimension table plays different logical role in a dimension modelExample: multiple date dimensionsOrder Date DimensionOrder date key PKOrder dateOrder date day of weekOrder date month…Order Transaction FactOrder date key FKShip date key FKProduct key FKOrder amount…Ship Date DimensionShip date key PKShip dateShip date day of weekShip date month…
34Role-playing Dimensions Other examples:Customer (ship to, bill to, sold to)Facility or port (origin, destination)Provider (referring, performing)Stored in the same physical table but presented in a separately-labeled viewImplemented using views or aliases depending on the database platform
35“Junk” DimensionsMiscellaneous flags and text attributes that cannot be placed into one of existing dimension tablesStore them in a “junk” dimensionStore as unique combinationsExample:Data profiling is useful in identifying junk dimension candidates
36Snowflaking Occurs when dimension tables are normalized Increases complexity for usersDecreases performanceBrand dimensionBrand key pkBrand descriptionSubcategory key FKProduct DimensionProduct key PKProduct DescrSKU numberBrand key FKPackage type key FKSubcategory dimensionSubcategory key pkSubcategory descriptionPackage type dimensionPackage type key pkPackage type descr
37Outrigger Dimensions Look like a beginning of a snowflake Example: Large number of attributesDifferent grainDifferent update frequencyCustomer dimensionCustomer key PKFnameLnameAddressCountyCounty demographics…County demographicsOutrigger dimensionCounty Demogr keyTotal populationMalesFemaleUnder 18…Fact tableCustomer key FK….
38Bridge Tables Used to implement variable-depth hierarchies Should be used only when absolutely necessaryNegatively affect usabilityDecrease performanceExample: reporting revenue for customers who has subsidiary relationshipCustomer hierarchybridgeParent Customer keySubsid. Customer key#levels from parentBottom flagTop flagFact tabledate key FKCustomer key F…Customer dimensionCustomer key FK….
393 Fundamental Fact Table Grains TransactionOne row per transaction/line of transactionRows are inserted into fact tables only when a transaction activity occurs
403 Fundamental Fact Table Grains Periodic snapshotAt predetermined intervals snapshots of the same level of details are taken and stacked consecutively in the fact tableExample: most financial reports, bank account valueComplements detailed transaction facts but not substitutes themShare the same conformed dimensions but have less dimensions
413 Fundamental Fact Table Grains Accumulating snapshotLess frequently usedHave multiple date FK that correspond to each milestone in the workflowLots of N/A or Unknown fields when a row is originally insertedRequires a special row in date dimension table as discussed earlier
42Facts of Different Granularity A single fact table cannot have facts with different granularityAll measurements must be in the same level of detailsExample:Measurements are captured for each line order except for the shipping charge which is for the entire orderSolutions:Allocating higher level facts to a lower granularityCreate two separate fact table
43Multiple Currencies and Units of Measures Measurements are provided in a local currencyMeasurements are also converted to a standardized currency or conversion rates must be storedSimilarly, in case of multiple units of measures, conversions to all different units of measure are provided
44Student attendance event facts Factless Fact Tablesbusiness processes that do not generate quantifiable measurementsExample: student attendanceCan be easily converted into traditional fact tables by adding an attribute Count, which is always equal to 1.Helps to perform aggregationsDate dimensionStudent attendance event factsDate keyStudent keyFacility keyFaculty keyCourse/section keystudent dimensionfacility dimensionfaculty dimensionCourse/sectiondimension
45Consolidated Fact Tables Fact tables populated from different sources may potentially be consolidated into single oneLevel of granularity must be the sameMeasurements are listed side-by-sideExample: by combining forecast and actual sales amounts, a forecast/actual sales variance amount can be easily calculated and stored
46Recommendations to Avoid Common Misconceptions about Dimensional Modeling Do not take a “report-centric” approachDo not create a new dimensional model for each slightly different reportDo not create a new dimensional model for each department for data from the same sourceCreate dimensional models with the finest level of granularity (atomic data)Flexible and independent of a specific business question/reportScalableUse conformed dimensionsease integration effortsMake ETL process structuredAvoid chaos when integrating multiple data marts
48E-R Diagram Customer #Cust No F Name L Name Ads1 Ads2 City State Zip Tel NoCC NoExpireRequestor ofRental#Rental NoDateClerk NoPay TypeCC NoExpireCC ApprovalLine#Line NoDue DateReturn DateOD chargePay typeOwner ofHolder ofTitle#Title NoNameVendor NoCostVideo#Video NoOne-day feeExtra daysWeekendName forE-R Diagram
49Dimensional Model Customer CustID Cust No F Name L Name Rental RentalIDRental NoClerk NoStorePay TypeLineLineIDOD ChargeOneDayChargeExtraDaysChargeWeekendChargeDaysReservedDaysOverdueAddressIDRentalIdVideoIDTitleIDRentalDateIDDueDateIDReturnDateIDVideoVideo NoTitleTitleNoNameCostVendor NameRental DateSQLDateDayWeekQuarterHolidayDue DateReturn DateAddressAdddress1Address2CityStateZipAreaCodePhoneDimensional Model
514 steps of dimensional modeling Choose a business processDeclare the grainIdentify dimensionsIdentify facts
52High-level model diagram Is a data model at the entity levelShows specific fact and dimension tables applicable to a specific business processGreat communication and training toolCurrencyDateOrder, DueProductPromotionOrder junkOrdersChannelCustomerSales person
53Derived facts Additive calculation using other facts in the same table Can be calculated using a viewExample: net sales based on subtraction of commission amount from the gross salesNon-additive calculation that is expressed at a different level of details than the fact table itselfCan be calculated by BI tools at the time of queryExample: Year-to-date sales
58Design documentBrief description of business processes included in the designHigh level discussion of the business requirements to be supported pointing back to the detailed requirements documentHigh level data model diagramDetailed dimensional design worksheet for each fact and dimension tableOpen issues list highlighting the unresolved issuesDiscussion of any known limitations of the design to support the project scope and business requirementsOther items of interest, such as design compromises or source data concerns)