Presentation on theme: "Data Warehousing Denis Manley Enterprise Systems FT228/3."— Presentation transcript:
1 Data WarehousingDenis ManleyEnterprise SystemsFT228/3
2 Data WarehousingA data warehouse is a repository for all an organisations data. It is designed to present data in very user-friendly formats to end-users.Why is Data Warehousing necessary?think of a typical company that uses computers for order processing, invoicing, etc.Data is gathered for every transaction that occurs. Different departments probably use different systems, each designed for a specific task.
3 Data Warehousing (cont’d) There will be lots of different database Tables, in different formats, on different machines. There will be tie-ups between data items, but those tie-ups may not be obvious outside the applications that manipulate the tables.Also, the amount of data is constantly increasing. Some estimates state that the size of a typical data processing system doubles every two years.Both standard, and ad-hoc, reports must be catered for, and they must run in a reasonable time-frame.
4 Data Warehousing (cont’d) Some solutions to the issues listed can be implemented in the applications that manage specific data.E.G. if you want to ask “How much does this customer owe?” then the original package is probably the one to use.But if you want to ask “Was this ad campaign more successful than that one?”, you require data from more disparate sources, and one application may not provide all of it.
5 Data Warehousing (cont’d) The new alternative is a Data Warehouse:a D.W. is a way of organising data from a wide range of formats, possibly from different locations, so it’s all available in a single location.Once this stage is complete, the collection of data is usually frequently replicated around multiple locations.This means users have a local copy of the data they need to inspect. This improves query run-times, and reduces communications overheads.
6 Data Warehousing (cont’d) The data is not only collected and joined, however. What the data means is described in a way that the end-users can easily understand, e.g. using the name “Customer Account Number” instead of “order>cacno” for a field.The data must be quickly accessible, so that users can find data quickly and easily.
7 Data Warehousing process Operational data in legacy systemse.g. OLTP apps.Data fusion: Assembles diverse dataData Cleansing: “fixing” dataMeta Data:shows transformations of data,origins of data, etc.Data Migration:load data and meta-data into warehouse periodicallyData WarehouseDecision-support analyst queries
8 Data Warehousing Architecture A data warewhouse is an architectural construct of an information system that provides users with current and historical decision support information that is hard to access or present in traditional operational data stores.It comprises a number of components illustrated in figure 1
9 Data Warehousing Architecture Figure 1 : A datawarehouse architecture
10 Data Warehousing Architecture Data Warehouse databaseThis is a cornerstone of the data warehouse (item 2 in figure 1)Almost always implemented on a relational database management systemHowever very large databases, ad hoc processing and the need for flexible user views e.g. aggregates and drill downs are calling for different technological approaches:Parallel relational database designs requiring the use of symmetric multiprocessors, Massively parallel processors and clusters
11 Data Warehousing Architecture Speeding up traditional RDBMSThe use of Multidimensional databases (MDDBs) which are tightly coupled to on-line analytical processing
12 Data Warehousing Architecture Sourcing, acquisation, cleanup and transformation of dataImplementing data warehouses involves extracting data from operational systems including legacy systems and putting it into a suitable format.The various tools are illustrated as item 1 in figure 1These tools perform all the conversions, summarisations, key changes, structural changes, and condensations needed to transform disparate data into information can be used by decision support tools
13 Data Warehousing Architecture It produces programs and control statements required to move data into the data warehouse form multiple operational systemsMaintains the metadataRemove unwanted dataConverts to common data names and definitionsCalculates summariesEstablish defaults for missing dataKeep track of source data definition changes
14 Data Warehousing Architecture The tools have to deal with:Database hetergeneity: DBMS can be very different in data models, data access language etc.Data hetergeneity: the difference in the way data is defined and used e.g. synonyms and different attributes for the same entity etc.
15 Data Warehousing Architecture Metadata (item 3 figure1): data about data that describes the datawarehouseTechnical metadata: contains information for warehouse designers and administratorsInformation about data sourcesTransformation descriptionsRules used to perform data clean upAccess authorisation, information delivery history, data acquisation history, data access etcBusiness metadata: information that gives users an understanding of the information stored in the data warehouse.Queries, reports imagesData warehouse operational information e.g. data history and ownership
16 Meta Data VersioningIn the operational environment, there tends to be a single instance of data and meta data at any one moment in time.In the data warehouse environment, there are multiple definitions of data and meta data over an historically long period of time.Therefore, versioning of data warehouse meta data is crucial to the success of the data warehouse vis-a-vis the end users’ ability to access and understand the data in the data warehouse.
17 Guidelines for Metadata Management Develop an Information Directory that integrates technical and business metadata.Keep the metadata current and accurate!Maintain the time variant history of the metadata!Provide meaningful descriptions and definitions (use business definitions, not technical definitions, where possible).The end users must be educated about what metadata is, how to access it, how to use it, etc.
18 Meta Data Answers Questions for Users of the Data Warehouse ??How do I find the data I need?What is the original source of the data?How was this summarization created?What queries are available to access the data?How have business definitions and terms changed over time?How do product lines vary across organizations?What business assumptions have been made?
19 The Role of Meta Data in the Data Warehouse Architecture Meta Data enables data to become information, because with it youKnow what data you have andYou can trust it!
20 Data marts (item 4 figure 1) A data mart is a data store that is subsidary to a data warehouse of intergrated data.The data mart is directed at a partition of data (subject area) that is created for the use of a dedicated group of users and is sometimes termed a “subject warehouse”The data mart might be a set of denormalised, summarised or aggregated data that can be placed on the data warehouse database or more often placed on a separate physical store.Data marts can be “dependent data marts” when the data is sourced from the data warehouse.Independent data marts represent fragmented solutions to a range of business problems in the enterprise, however, such a concept should not be deployed as it doesn’t have the “data intergration” concept that’s associated with data warehouses.
21 Independent Data Marts Systems of RecordExtract,Transform,Clean, Integrate,Summarize, etc....
22 Independent Data Marts Extract,Transform,Clean, Integrate,Summarize, etc....Three Times!
23 Independent Data Marts $Significant and expensive duplication of effort and data.
24 Independent Data Marts $$Maintenance of proliferating unarchitected marts expensive and cumbersome.
25 Unarchitected Data Marts md?mdThere may be metadata for some marts, but what about consistency & history?
26 Contrast Architected Data Warehouse Data AccessData WarehouseDept’lmdAtomicDept’lmdInformation DirectoryDept’lmdSystems of RecordDependent (Architected) Departmental “Marts” with the Appropriate Subset of Metadata
27 Independent Data Marts vs. The Real Thing Unarchitected Data MartsData WarehouseAtomicDept’lmdArchitected to meet organizational as well as departmental requirementsData and results consistentRedundancy is managedDetailed history available for drill-downMetadata is consistent!Easy to do, but...Are the extracts, transformations, integrations & loads consistent?Is the redundancy managed?What is the impact on the sources?
28 Independent Data marts However, such marts are not necessarly all bad.Often a valid solution to a pressing business problem:Extremely urgent user requirementsThe absence of a budget for a full data warehouseThe decentralisation of business units
29 Data Warehousing Architecture Access Tools (item 5 figure 1)The principal purpose of the data warehouse is to provide information for strategic decision making.The main tools used to achieve this objective are:Data query and reporting toolsExecutive information system toolsOn-line analytical processing toolsData mining tools
30 Data Warehousing Architecture Query and reporting toolsReporting toolsProduction reporting tools e.g. generate operational reportsReport writers: inexpensive desktop toolsManaged Query toolsShield users from the complexities of SQL and database structures by inserting a metalayer between the users and the database
31 A Few Definitions OLAP Multidimensional Analysis On-Line Analytical ProcessingA set of functionality that attempts to facilitate multidimensional analysisMultidimensional AnalysisThe ability to manipulate information by a variety of relevant categories or “dimensions” to facilitate analysis and an understanding of that dataHas also been called as “drill-down”, “drill-across” and “slice and dice”
32 A Few Definitions Hypercube Star Schema Snowflake Schema A means of visually representing multidimensional dataStar SchemaA means of aggregating data based on a set of known dimensions, attempting to store data multidimensionally in a two dimensional RDBMSSnowflake SchemaAn extension of the star schema by means of applying additional dimensions to the dimensions of a star schema in a relational environment
33 A Few Definitions Multidimensional Database OLAP Tools Also known as MDDB or MDDBSA class of proprietary, non-relational database management tools that store and manage data in a multidimensional manner, as opposed to the two dimensions associated with traditional relational database management systemsOLAP ToolsA set of software products that attempt to facilitate multidimensional analysisCan incorporate data acquisition, data access, data manipulation, or any combination thereof
34 A Few Definitions ROLAP Relational OLAP Using an RDBMS to implement an OLAP environmentTypically involves a star schema to provide the multidimensional capabilitiesOLAP tool manipulates RDBMS star schema data
35 A Few Definitions MOLAP (1) MOLAP (2) Multidimensional OLAP Using an MDDBS to store and access dataMDDBS “directly” manages data multidimensionallyUsually requires proprietary (non-SQL) access toolsMOLAP (2)OLAP tool facilitates multidimensional capabilities without the need for a star schemaOften utilizes a 3-Tier environment, where middle tier server preprocesses data from an RDBMSSome OLAP tools access an RDBMS directly and build “cubes” as a “fat” client
36 How Can OLAP Be Accomplished? Use the Data Warehouse as the architected foundation for the organization’s informational processing requirementsUse the appropriate design techniques to ensure that the data required is at the appropriate degree of granularity at the atomic level, and the appropriate degree of summarization at the departmental levelUse the appropriate tools to either access relational data in a multidimensional manner, or manage multidimensional data
37 Data miningA CSF for any business is its ability to effectively use information.This strategic use can occur from discovering hidden, undected and frequently valuable information about customers suppliers retailers etc.This information can be used to formulate effective business, marketing and sales strategies.A relatively new technology that can be used to achieve this strategic advantage is “data mining”
38 Data visualisationData visualisation is a method of presenting the output of the previously mentioned methods in such a way that the problem or solution is clearly visible to domain experts and even casual observers.It goes way beyond simple bar and pie chartsIt is a collection of complex techniques that focus on determining the best way to display complex patterns on a 2-D computer monitor.Such techniques involve experimenting with various colours, shapes 3-d images, sound and vittual reality to help users really see and feel the problem and solution.
39 DW administration and management (Item 6 figure 1) Managing data warehouses involves:Security and priority managementMonitoring updates from multiple sourcesData quality checksManaging and updataing metadataReplicating, subsetting and distributing dataBackup and recoveryData warehouse storage management e.g. capacity planning, hierarchical storage management, purging of aged data.
40 Information delivery system (Item 7 figure 1) The IDS is used to enable the process of subscribing for data warehouse information and having it delivered to one or more destinations of choice according to some user-specified scheduling algorithmIDS may be based on time of day or on completion of an external event.IDS can be achieved by Client/Server architecture and now by Interner/intranet and World Wide Web.
42 10. Pre-selecting Your Technical Environment This is a very common trap in which many organizations find themselves. It is traditional to select the hardware, software and other physical, technical components of a system as one of the earliest activities.However, a data warehouse is an unstable environment from a sizing perspective. How do you know the hardware/RDBMS/end user tool is appropriate for your data warehouse before conducting even the first round of analysis?If at all possible, wait to select your technical environment until after you have analyzed the business requirements for information, data, and potential systems of record.
43 9. Allowing Access Tool to Determine Data Architecture This is an extension of #10, but is important enough to list by itself.If you select an end user tool before developing your data architecture, it is very likely that that architecture will suffer at the hand of design requirements delivered by the tool.If you have to sacrifice design requirements in order to meet functional requirements of a tool, it is probably time to put that tool aside and select another one.
44 8. Unarchitected Data Marts OK. Data marts are good; they are an essential part of the data warehouse architecture. But to build only a data mart and to ignore the rest of the data warehouse (specifically the atomic level data and centralized meta data) will lead you down a path that will be more expensive and deliver less quality of data than the alternative.The alternative is to architect and build the data warehouse incrementally, iteratively. Include data marts as departmental instances of the architecture, and populate them from the atomic level data. This will ensure accuracy across the architecture, and reduce costs by eliminating unnecessary population of stand-alone data marts.
45 7. Boiling the OceanIt is more efficient to implement the data warehouse in small, achievable and palatable chunks than to try to implement it all at once. When I say “boil the ocean”, I mean trying to do too many things for too many people all at the same time.There is an old adage: “You can have everything; where would you put it all?” The same holds true for a data warehouse. If you try to design, develop and implement a data warehouse that is all-encompassing as your first iteration, how will the users be able to use all that you delivered? And in the mean time, while you’ve been trying to meet all of their needs, you have failed to meet any needs. And users won’t forget that for a long time.
46 6. “If you build it they will come” If you design, develop and implement an operational system, such as an order processing system, that new system is typically going to replace an existing system. In other words, the old system goes away and the users have no choice but to use the new one. Not so with the d/w.“If you build it…” implies an analysis that includes only bottom-up activities. It is crucial to the success of a data warehouse that a top-down analysis of user requirements for information be conducted.After that, users must be tutored, mentored and otherwise have their hands held as part of the implementation of the data warehouse. Existence does not guarantee utilization and, therefore, value.
47 5. Putting ROI before RFI (Requirements for Information) It is very difficult to quantify the intangible benefits that a data warehouse can provide to an organization. How can you put a price on increased customer loyalty. Somewhere, sometime, someone has probably made this calculation. In most cases, however, the determination of how beneficial the data warehouse will be is based on criteria that was developed for operational systems. Just as you cannot use operational data to meet your strategic informational requirements, it is difficult to calculate the return on investment (ROI) of a data warehouse.In terms of benefits to the organization, it is more appropriate to concentrate on how well the data warehouse addresses the target users’ requirements for information.
48 4. No Committed User Involvement Write this down:The success of any data warehouse is directly proportional to the amount of end user participation!A data warehouse cannot be successful without active participate on the part of the target users. Period. If you do not have user participation, you will find yourself in a situation where you will build it and hope that they will come.If there is no serious user participation in a data warehouse project, you have to seriously question whether or not the organization truly needs a data warehouse.
49 3. No Dedicated DBAIn many situations the lack of a dedicated database administrator (DBA) has prevented a data warehouse project to be complete 1) on time, or 2) successfully.“Borrowing” a DBA from the operational “pool” will only result in questions about the nature of the data warehouse data models and database design. It’s too flat, not normalized properly, too much redundancy, and other criticisms are well suited for an operational system’s database design, but not a data warehouse.Considering that “data” is the first word in “data warehouse”, be sure you have a dedicated database administration resource committed to this important project.
50 2. No Meta DataMeta data is like documentation and training: Everyone knows it is necessary, but it usually gets dropped somewhere along the route to implementation.For the data warehouse, meta data is more important than just your typical documentation. Remember, in order to turn data into information you have to have the data, know that you have it, be able to access it, and trust it. Meta data is the means by which the users will be able to understand and trust the data. A time-variant record of where data came from, what happened to it along the way, where it is in the data warehouse, and what vehicles exist to access it will spell the difference between success and frustration.
51 1. Analysis ParalysisThe inability to proceed past a sticking question. Wanting to “boil the ocean” and model/design everything before proceeding with development. Having to resolve political issues surrounding a “standard” or “common” definition. All of these things (and more!) will result in analysis paralysis.The 80/20 rule is very applicable to the development of a data warehouse. Execute 20% effort to get 80% of the total outcome, then move on to the next set of challenges and opportunities. Many data warehouse failures started when the development team stopped.Get your hands around an idea, understand what the users’ requirements for information are, and build something that produces something that can be evaluated. Don’t just stand there…do something!
52 Parallel Data Management A topic that’s closely linked to Data Warehousing is that of Parallel Data Management.The argument goes:if your main problem is that your queries run too slowly, use more than one machine at a time to make them run faster (Parallel Processing).Oracle uses this strategy in its warehousing products.There are two types of parallel processing - Symmetric Multiprocessing (SMP), and Massively Parallel Processing (MPP)
53 Parallel Data Management (cont’d) SMP - means the O.S. runs and schedules tasks on more than one processor without distinction.in other words, all processors are treated equally in an effort to get the list of jobs done.MPP - more varied in its design, but essentially consists of multiple processors, each running their own program.the problem with MPP is to harness all these processors to solve a single problem.
54 Parallel Data Management (cont’d) Regardless of the architecture used, there are still alternatives regarding the use of the parallel processing capabilities.In normal Transaction processing, each transaction runs on a separate processor, because transactions are small units of work that run in a reasonable time-span.However, the type of analysis carried out in data warehouse applications isn’t like that. Typically you want to run a query that looks at all the data in a set of tables. The problem is splitting that into chunks that can be assigned to the multiple processors.
55 Parallel Data Management (cont’d) There are two possible solutions to this problem: Static and Dynamic Partitioning.In Static Partitioning you break up the data into a number of sections. Each section is placed on a different processor with its own data storage and memory. The query is then run on each of the processors, and the results combined at the end to give the entire picture.This is like joining a queue in a supermarket. You stay with it until you reach the check-out.
56 Parallel Data Management (cont’d) The main problem with Static Partitioning is that you can’t tell how much processing the various sections need. If most of the relevant data is processed by one processor you could end up waiting almost as long as if you didn’t use parallel processing at all.In Dynamic Partitioning the data is stored in one place, and the data server takes care of splitting the query into multiple tasks, which are allocated to processors as they become available.This is like the single queue in a bank. As a counter position becomes free the person at the head of the queue takes that position
57 Parallel Data Management (cont’d) With Dynamic Partitioning the performance improvement can be dramatic, but the partitioning is out of the users hands.
58 2 Tiered Architecture Tier 1 Tier 2 Enterprise Server Client Shared Global Data StorageShared Application LogicThis slide defines the fundamentals of a two-tiered architecture.End User FunctionalityData DisplayPersonal Data StoragePersonal Application LogicThis slide defines the fundamentals of a two-tiered architecture
59 2 Tiered Architecture in the DW Environment Enterprise ServerTier 2ClientAtomic Data AcquisitionOrganizational Level DataSecondary Data AcquisitionDepartmental Level DataThis slide depicts where the various components of the data warehouse architecture would fall in a two-tiered technical architecture.Data Access ProcessingIndividual Level DataData ManipulationThis slide depicts where the various components of the data warehouse architecture would fall in a two-tiered technical architecture.
60 2 Tiered Architecture in the DW Environment Enterprise ServerTier 2ClientDept’lAtomicInd’lInformation DirectoryThis is a graphical representation of the two-tiered data warehouse architecture. The atomic level, metadata and departmental levels would reside on the Enterprise Server, while any individual level data would be stored on the client.In this scenario the client is a “thin” one, with little processing beyond query and simple data manipulation.This is a graphical representation of the two-tiered data warehouse architecture. The atomic level, metadata and departmental levels would reside on the Enterprise Server, while any individual level data would be stored on the client. In this scenario the client is a “thin” one, with little processing beyond query and simple data manipulation.
61 2 Tiered Architecture in the DW Environment Enterprise ServerTier 2“Fat” ClientDept’lAtomicInd’lInformation DirectoryIn this variation, a “fat” client at the second tier facilitates a considerable amount of data manipulation, freeing up capacity on the Enterprise Server for other, more global activities.In this variation, a “fat” client at the second tier facilitates a considerable amount of data manipulation, freeing up capacity on the Enterprise Server for other, more global activities.
62 2 Tiered Architecture in the DW Environment Enterprise ServerTier 2“Thin” ClientDept’lAtomicInd’lInformation DirectoryMultidimensional PreprocessingIn this variation the second tier is a “thin” client, and volumes of data are pre-processed on the Enterprise Server. This enables access to sophisticated analyses by more than one “thin” client.In this variation the second tier is a “thin” client, and volumes of data are pre-processed on the Enterprise Server. This enables access to sophisticated analyses by more than one “thin” client.
63 3 Tiered Architecture Tier 1 Tier 2 Tier 3 Departmental Server Enterprise ServerTier 2Departmental ServerTier 3ClientShared Global Data StorageShared Application LogicIn a three-tiered technical environment a “middle” tier has been added for shared data and application logic.Shared Local DataShared Application LogicEnd User FunctionalityPersonal Data StoragePersonal Application LogicIn a three-tiered technical environment a “middle” tier has been added for shared data and application logic.
64 3 Tiered Architecture in the DW Environment Enterprise ServerTier 2Departmental ServerTier 3ClientAtomic Data AcquisitionOrganizational Level DataIn the data warehouse, this middle tier can take on the responsibility for departmental level (data mart) processing. There is little or no impact on the third tier, while the first tier is relieved of some mid-level functionality.Secondary Data AcquisitionDepartmental Level DataData Access ProcessingIndividual DataData ManipulationIn the data warehouse, this middle tier can take on the responsibility for departmental level (data mart) processing. There is little or no impact on the third tier, while the first tier is relieved of some mid-level functionality.
65 3 Tiered Architecture in the DW Environment In this graphical representation of the three-tiered technical architecture, the departmental data has been moved to the middle tier. In reality, departmental instances can still exist on the first tier. In cases where the middle tier is geographically distant from the first tier, it is advisable to provide a subset of local metadata for the users of the middle tier’s data.Tier 1Enterprise ServerTier 2Departmental ServerTier 3ClientDept’lAtomicInd’lLocal m/dInformation DirectoryIn this graphical representation of the three-tiered technical architecture, the departmental data has been moved to the middle tier. In reality, departmental instances can still exist on the first tier.In cases where the middle tier is geographically distant from the first tier, it is advisable to provide a subset of local metadata for the users of the middle tier’s data.
66 3 Tiered Architecture in the DW Environment This scenario also supports a “fat” client at the third tier, where the individual level data is maintained and manipulated.Tier 1Enterprise ServerTier 2Departmental ServerTier 3“Fat” ClientDept’lAtomicInd’lLocal m/dInformation DirectoryThis scenario also supports a “fat” client at the third tier, where the individual level data is maintained and manipulated.
67 3 Tiered OLAP Arch. in the DW Environment One of the biggest benefits of a three-tiered technical architecture for a data warehouse is the ability to facilitate tremendous amounts of pre-processing of data on the middle tier.Tier 1Enterprise ServerTier 2Departmental ServerTier 3“Thin” ClientAtomic Data AcquisitionOrganizational Level DataOne of the biggest benefits of a three-tiered technical architecture for a data warehouse is the ability to facilitate tremendous amounts of pre-processing of data on the middle tier.Secondary Data AcquisitionDepartmental Level DataShared Data Access ProcessingData AccessIndividual DataData Manipulation
68 3 Tiered OLAP Arch. in the DW Environment In this case, the middle tier pre-processes data for use by many “thin” clients, rather than requiring the individual levels on the third tier to repetitively processes the same data with complex algorithms.Tier 1Enterprise ServerTier 2Departmental ServerTier 3“Thin” ClientDept’lAtomicMultidimensional PreprocessingInformation DirectoryLocal m/dInd’lIn this case, the middle tier pre-processes data for use by many “thin” clients, rather than requiring the individual levels on the third tier to repetitively processes the same data with complex algorithms.
69 The Atomic Schema Customer Cust Purchases Product Ref Cust Averages Customer IDStatus DateCust Addr StateCust ZIP CodeCustomer TypeCustomer Status...Customer IDActivity DateProduct CodeProduct NameSales Rep IDQty PurchasedTotal DollarsPromotion FlagCust PurchasesProduct CodeProdRef Eff. DateProdRef End DateProduct NameUnit PriceProduct CategoryProduct TypeProduct Sub TypeProduct RefCust AveragesCustomer IDCust Average DateCust Avg. End DateCust Avg. Rev.Cust LongevityAtomic level data structured to support a wide variety of informational requirements across the organizationAs a result, atomic data too normalized to be easily accessed or understood by most end usersData consistently needs to be aggregated into the same categories (dimensions)Multidimensional processing capabilities provide users with tremendous flexibility for most of their analysis requirementsStore IDStore NameStore LocationDistribution ChannelOutlet ReferenceSales Rep IDSales Person NameStore IDSales Rep Ref
71 Dimension TableDimension Table 1Dimension Key 1Description 1Aggregatn Lvl 1.1Aggregatn Lvl 1.2Aggregatn Lvl 1.nDescribes the data that has been organized in the Fact TableKey should either be the most detailed aggregation level necessary (e.g. country vs. county), if possible, or...Surrogate keys may be necessary, but will decrease the natural value of the keyManageable number of aggregation levels
72 Fact TableQuantifies the data that has been described by the Dimension TablesKey made up of unique combination of values of dimension keysALWAYS contains date or date dimensionFact values should be additiveAggregations of quantities or amounts from atomic levelNo percentages or ratiosMay be non-additive, time-variant dataDimension Key 1Dimension Key 2Dimension Key 3Dimension Key 4Fact 1Fact 2Fact 3Fact 4.Fact nFact Table
73 For Example: Purchases 1 Customer Location Selling Responsibility Cust ZIP CodeCityState/ProvinceCountryCustomer LocationSelling ResponsibilitySales Rep IDSales Rep NameStore IDStore NameStore LocationSales ChannelPurchases 1Days of ActivityUnit PriceTotal QuantityTotal DollarsReturned QtyReturned DollarsPromotion QtySales Rep IDProduct CodeCust ZIP CodeCustomer TypeWeek Ending DateCustomer TypeCust Type DescProductProduct CodeProduct NameProd. CategoryProduct TypeProd Sub TypeWeek Ending DateMonthQuarterYearDate Information
75 Caution!: Overly Complex Dimension Number of aggregation levels within the dimension becomes unmanageableLogically or functionally incorrect combination of aggregation levels within a dimension
76 Answer #1: Split Dimension Split overly complex dimension into several logical, more manageable dimensions, based on business function of aggregation levels
77 Answer #2: The Snowflake Identify hierarchies of aggregation levels and “dimensionalize” the primary dimensionSecondary dimensions descriptive of the primary dimension
78 Answer: Distinct Time Period Fact Tables WeeklyDateD1D2D3D4MonthlyDateD1D2D3D4Create separate fact tables to account for different time periodsDate still part of each fact table keySame dimension tables used by both fact tablesImproves overall performance (loading and accessing) for each time periodWill not increase amount of managed redundancyDifferent time periods (weekly, monthly, accounting period, billing cycle) required for different analysis purposes.