Presentation is loading. Please wait.

Presentation is loading. Please wait.

Clayton Groom Covenant Technology Partners Intro  Clayton Groom  Founding Partner of CTP   Twitter: cgroom  BI professional for.

Similar presentations


Presentation on theme: "Clayton Groom Covenant Technology Partners Intro  Clayton Groom  Founding Partner of CTP   Twitter: cgroom  BI professional for."— Presentation transcript:

1

2 Clayton Groom Covenant Technology Partners

3 Intro  Clayton Groom  Founding Partner of CTP  cgroom@mailctp.com  Twitter: cgroom  BI professional for 15 years  MCP BI, MS Visual Technology Specialist (VTS)

4 Agenda  Technical architecture  Fundamentals  Dimensional modeling rules  Conformed dimensions  ETL considerations  Questions

5 Credits & Thanks  Jim Ronan – for getting me started  Ralph Kimball – for making the way clear  Joy Mundy - from whom I’ve shamelessly copied slides…

6 Operational vs. Analytic systems Operational SystemAnalytic system Purpose Execution of a business process Measurement of a business process Primary Interaction Style Insert, Update, Query, DeleteQuery Scope of Interaction Individual transactions Aggregated transactions Query Patterns Predictable and stable Unpredictable and changing Temporal Focus CurrentCurrent and historic Design Optimization Update Concurrency High-performance query Design Principle Entity-Relationship (ER) Design in third normal form (3NF) Dimensional design (Star Schema or Cube) Also Known as Transaction system On Line Transaction Processing System (OLTP) Source System Data Warehouse System Data Mart

7 The DW/BI Technical Architecture Source Systems ETL Standard Reports BI Portal Business/Extract Rules PresentationServers RDBMS Ad Hoc Queries OLAP Analytic Apps (incl. Data Mining) Metadata Operational BI MSFT DW/BI Components Integration Services RDBMS Analysis Services 2007 Microsoft Office system SharePoint PerformancePnt Report Builder Reporting Services Data Mining Visual Studio Business Users Data Quality Dimensionalization

8 Kimball Method Fundamentals  Focus on the business  Build a dimensional data warehouse / business intelligence system  Dimension tables such as Customer contain descriptive information  Fact tables such as Sales contain detailed transactions, and link to the dimensions  Dimension attribute changes are managed and conformed across the enterprise  Excellent user experience is paramount

9 Kimball Business Dimensional Lifecycle

10 Relational Dimensional Model Product Key Customer Key Date Key … other keys Sales Amount Sales Quantity … other measures Sales Fact Product Key Product SKU Product Name Product Brand Product Family … other attributes Product Date Key Date Month Name Year Month Calendar Qtr … other attributes Date Customer Key Account Number Customer Name Customer Zip Customer City … other attributes Customer Promotion Key PromoNamePromoContactPromoCity Sales Rep Key Sales Rep Name Sales Rep Contact Sales Rep City Store Key Store Name Store Contact Store City Other dims…

11 It’s not Rocket Science…

12 But there is a science to it…  The goal is to provide information to end users in a way that they don’t have to be rocket scientists to use it.

13 Kimball’s Ten Essential rules of Dimensional Modeling  Rule #1: Load detailed atomic data into dimensional Structures  Avoid summarizing data at all costs  Users will want the details…  You can always roll up, but can’t drill down if the detail is not there

14 Kimball’s Ten Essential rules of Dimensional Modeling  Rule #2: Structure Dimensional models around business processes  Business processes capture metrics associated with measured business events  Metrics translate into facts, stored in a process-specific atomic fact table  Accounts Payable  Accounts Receivable  General ledger  Orders  Shipments

15 Kimball’s Ten Essential rules of Dimensional Modeling  Rule #3: Every Fact table has a date dimension table associated with it  Measurement events will always have a date associated with them  Day grain at a minimum  Separate Time of Day dimension for time  Multiple date/time dimension aliases on any given fact table  Order Date  Promise Date  Ship date

16 Kimball’s Ten Essential rules of Dimensional Modeling  Rule #4: Ensure that all facts in a single fact table are at the same grain or level of detail  Possible grain types:  Transactional (GL, Orders, etc.)  Periodic Snapshot (Bank Balance, Inventory levels)  Accumulating Snapshot (“To date” statuses, Fact table updated)

17 Kimball’s Ten Essential rules of Dimensional Modeling  Rule #5: Resolve many-to-many relationships in fact tables  Use M:M Bridge tables when appropriate  Multiple diagnoses associated with a single event

18 Kimball’s Ten Essential rules of Dimensional Modeling  Rule #6: Resolve many-to-one relationships in dimension tables  Resist the urge to “snowflake”  Know what exceptions are allowable  Large product catalog: Size + Color + Style = SKU

19 Kimball’s Ten Essential rules of Dimensional Modeling  Rule #7: Store report labels and filter domain values in dimension tables  Keep the end user in mind…  Proper case attributes so they don’t SHOUT  Combines code/description attributes  Division Code = ‘001’  Division Name = ‘Springfield’  Division Label = ‘Springfield (001)’ or ‘(001) Springfield’

20 Kimball’s Ten Essential rules of Dimensional Modeling  Rule #8: Make sure dimension tables use a surrogate key  Don’t forget to have alternate keys on business key column(s)!  Required to support slowly changing dimensions (SCD’s)  You cannot rely on source systems for business keys.

21 Kimball’s Ten Essential rules of Dimensional Modeling  Rule #9: Create conformed dimension to integrate data across the enterprise  Common reference dimensions are key to:  Integrating data across multiple source systems  Required for cross-drill from one fact table to another  Enterprise DW Bus Matrix – the blue print for your EDW  Reuse enables rapid development  Require a commitment to data stewardship

22 DW Bus Matrix Sample

23 Kimball’s Ten Essential rules of Dimensional Modeling  Rule #10: Continuously balance the requirements and realities to deliver a DW/BI solution that’s accepted by the business users and supports their decision making.  Don’t forget your audience  Deliver incrementally

24 Surrogate Keys  Dimension PKs should be surrogate (meaningless) keys  Managed by the DW  Usually an integer type  Usually populated via IDENTITY keyword in dimension table definition  Why?  Small (int) keys are vital for performance  The source system WILL screw you if you don’t manage the keys yourself  Enables dimension attribute change tracking

25 Surrogate Keys and ETL  Dimensions  Carry source system key(s) as non-key attributes in the dimension  New rows automatically get a new surrogate key  Facts  Fact table usually does not contain source system keys  Final step of fact processing is to exchange the source system keys for DW surrogate keys  Lookup to dimension tables based on source key, returning surrogate key

26 Conformed Dimensions  There is one master dimension table that all fact tables subscribe to  Get agreement organization-wide on:  What the dimensions are called  Which hierarchies you have  Similar-but-different attributes and hierarchies have different names  Which attributes are managed by restating history and which by tracking history  Create two sets of attributes if you need it both ways  Why?  Single version of the truth  Flexibility of basic design

27 Dimensional Modeling Myths and Misconceptions  Dimensional means summary  Dimensional models are built to support specific applications (or departments)  The dimensional model is less flexible than a third normal form model in DW/BI systems  The dimensional approach is not Enterprise oriented 26

28 Gather Business Requirements  Gather detailed requirements from the business users  This means you have to talk to them  Document requirements, or you’ll have to do it again  Talk to IT (data experts) too  Multi-step process  [Sometimes] Overview of the landscape. Where should we begin? Overview of data realities  Detailed requirements and data realities

29 Profile the Data Early and often Does the data exist to support the required analysis? Where are the problems affecting ETL design Primary keys Referential integrity NULL values Junk values The dreaded “Notes” field Hijacked fields (My Favorite!)

30 Design the Dimensional Model Dimensions are the key UI element to your organization’s information Dimension grain −What does a row in the dimension table represent? Dimension hierarchies −Real vs. navigational Dimension attribute changes −Type 1: Restate history −Type 2: Track history

31 Changing Attributes Each attribute (column) in the dimension table is subject to change −What do the business users want? −Sometimes they want it both ways Type 1 (Update) −Easiest for base ETL, simply update column in place −Potentially problematic for aggregate management Type 2 (Track history) −Add a new row for each new set of attributes −Facts tie in to the set of attributes in effect at trxn time −Track row effective date range, row is current, potentially row change reason

32 Data Extraction Best Practices Minimize impact on the source system −Keep source queries simple −Often, stage to file or table rather than ETL in stream −Avoid transforming data upon extract Save a copy of the untransformed data −For archival purposes (Internal Audit will love you, insofar as they are capable of emotion) −Design packages with a restart point here Often makes sense to separate E and TL into separate packages

33 Extracting Data  Relational sources  Usually pull from sources (use a query)  Keep the source query simple  Implement source query in the SSIS Data Flow task, Data Reader Source  Non-relational sources  Usually pushed from source system  Flat files common  Third-party connectors can allow pull from within SSIS

34 Compute Data Validity Measures  Compute and check measures of data validity as early in the process as possible  Rowcounts  “Reasonableness” counts  Income=Expense  Sold more than N products to X customers  Store data validity measures in processing metadata  Evaluate data validity before moving forward

35 Dimension Hierarchy Management  Get religious about hierarchies  A true hierarchy has referential integrity between levels  Anything else is simply a reporting relationship  Do you need (or want) a key for hierarchy levels?  The payoff comes to users, especially in SSAS  The ETL system is not the right place to manage true hierarchies  People need to be involved  Do it before ETL  This is a job for Master Data Management or correctly designed source systems

36 Dimension Loading during Fact Processing  Normal case  Process dimensions first, then facts  Fact table transformation ends with surrogate key pipeline  Look up each source system key for its DW surrogate key  Join to populated dimension tables on source system key  Return surrogate key  Fairly common exception case  Load dimension row as observed in fact stream  Early arriving facts  Many-to-many dimensions

37 Data Loading Best Practices  SQL Destination vs. OLEDB  Always use Error Flow on load  Add a Data Viewer for the first time you try to load data  Fast load restrictions limit incremental fast loads to partitioned tables in most scenarios  Updates and deletes:  Good design usually forbids actual deletes, instead set flag  Ledger transaction-grain facts, even if your source system does not  OLEDB Command transform  MERGE SQL statement  Set-based UPDATE SQL statement

38 Overall Design Considerations  ETL application must understand Analysis Services’ requirements  The only way to process a Fact’s update or delete in SSAS is to fully process the partition containing that fact  ETL needs to know which SSAS partitions to process  ETL developers aren’t allowed to compromise business user requirements without a cost/benefit analysis and signoff

39 Conclusion  ETL system is harder to design and develop than most people think  SQL Server Integration Services contains functionality you need to build an enterprise-class ETL system  Must add some scripting  It’s not hard!  Some problems are intractably difficult, and no tool will make them magically go away  Don’t skimp on ETL! Great ETL makes Analysis Services & Reporting easy.

40 Additional Resources  Books  The Microsoft Data Warehouse Toolkit, J. Mundy and W. Thornthwaite, Wiley (2006)  The Data Warehouse ETL Toolkit, R. Kimball and J. Caserta, Wiley (2005)  The Data Warehouse Toolkit, 2 nd Edition, R. Kimball and M. Ross, Wiley (2002)

41 Even More Resources  Websites  All sample packages are at www.MsftDWToolkit.comwww.MsftDWToolkit.com  See fourteen years of free articles at www.KimballGroup.com www.KimballGroup.com  Classes  Microsoft Data Warehouse in Depth, a 4-day class in the SQL Server product set at http://www.kimballgroup.com/html/kucourseMDWD.ht ml

42 Questions?


Download ppt "Clayton Groom Covenant Technology Partners Intro  Clayton Groom  Founding Partner of CTP   Twitter: cgroom  BI professional for."

Similar presentations


Ads by Google