Basic guidelines for the creation of a DW Create corporate sponsors and plan thoroughly Determine a scalable architectural framework for the DW Identify and document all assumptions, conflicts and issues at the start of the project Choose methodology and tools which are compatible with the organisation Take the continuous nature of the DW life cycle into account Ensure a thorough data analysis and resolve all data conflicts Learn from your experience and above all, learn from your mistakes
The data warehouse life cycle Investigation Analysis of current environment Identify requirements Identify architecture Data warehouse design Development Implementation On-going data administration
Creating a strategy for a data warehousing Identification of current information strategy (usually an implicit strategy) Internal appraisal of the strategy External appraisal of the strategy Perform a strategic gap analysis Enumerate strategic alternatives which can be achieved through the application of a DW Choose among alternatives on the basis of the organisations mission, objectives and resources Formulate a strategic statement which specifies the organisation’s utilisation of its information resources and the role the DW is to play in this utilisation
Building a business case for a data warehouse Describe the AS-IS situation Identify business goals that the DW will help to fulfil Identify business problems that the DW can be used to solve Describe the TO-BE situation Explain how the data warehouse will be used to evolve the organisation from the AS-IS to the TO-BE situation Calculate the expected ROI for the data warehouse identify all costs associated with the data warehouse quantify all the benefits between the AS-IS and TO-BE situation
Creating a project plan What is the project scope? determine the scope of the data determine the role of technology determine any temporal considerations What is the business reason? identify the essential purpose of the project identify business drivers What are the critical success factors for the project? identify critical objectives identify critical tasks and activities What are the resource constraints on the project? Determine the need for resources determine the resource availability
Output from the planning phase A set of activities to be performed and a set of requirements on these activities (performance metrics) Documented business drivers Definition of scope of data Defined temporal scope Business reasons The overall approach Participants and their roles Assumptions and constraints Project management strategy
Data warehouse development activities Architecture definition Data modelling Planning and project initiation Decision maker needs Subject area analysis Source system analysis Transform design Physical database design Warehouse development End-user access design End-user access definition End-user access development Warehouse populate and implement
Project delivery tactics First things first Market the project Adopt a customer-focused orientation Deliver everything well Note: the project must always be able to show progress and have the ability to deliver business value
Focus areas for the management of a data warehouse Monitor and manage data warehouse activity Monitor and manage data warehouse data Monitor and manage security in the data warehouse Monitor and manage the data warehouse data model Monitoring and managing data warehouse metadata Monitoring and managing the integration and transformation interface Monitoring and managing the demands of the data warehouse’s business environment
Staffing requirements for initial data warehouse development and subsequent DW management Data warehouse management and maintenance –Data warehouse administrator –Data warehouse organisational change manager –Database administrator –Data warehouse maintenance developers –Metadata Manager Analysis and design –Business requirements analysts –User groups –Data warehouse architect Data procurement –Data quality analyst –Data acquisition developer –Data access developer IS executive sponsor
Data warehouse end-user roles and responsibilities Support roles Iteration sponsors Subject matter experts User support technician User types Unlimited ad hoc user access users Limited ad hoc access users Predefined application users Data warehouse initial and ongoing end-user staffing Subject matter experts User support technician
Data Quality (1) Data should be accurate Data should be stored according to data type Data should have high integrity Data should be consistent Databases should be well designed Data should not be redundant Data should follow business rules Data should correspond to established domains
Data quality (2) Data should be timely Data should be well understood Data should be integrated Data should satisfy the needs of the business Users should be satisfied with the data and the information derived from the data Data should be complete There should be no duplicate records There should be no data anomalies
A four-phase process to achieve high data quality Data investigation –parsing –lexical analysis –pattern investigation –data typing Data conditioning and standardisation Data integration Data Survivorship and formatting
Fundamental types of data transformation Simple transformation data type conversion date/time format conversions field decoding Cleansing and scrubbing valid values complex reformatting Integration simple field level mappings complex integration Aggregation and summarisation
Other transformations Operating system conversions Hardware architecture conversions affects the structure of data affects the structure of programs running against the data affects the computer operations needed for each environment the available software which makes the different environments run Application conversions
Types of source system extracts Point-in-time snapshots scheduled at specific points in time efficient method for users to pinpoint specific points in time or ranges of time unfortunately requires nearly a complete read of all operational sources of data Significant business events non-predetermined events drive the capture of data captured as a snapshot of relevant data entities triggered when a completion event is performed Delta data see next slide...
Types of source system extracts - Delta data Delta data is both new and changed data, it represents changes from one point in time to the next Delta data can be captured in a number of ways: –Operational events –Changed data capture –date last modified –Point-in-time comparisons
Types of data warehouse updates Insert Full replace Partial replace Update Update plus insert Insert with update Replace and insert
Copy management What to extract? When to extract? How to extract? Transformation requirements? What to transform? How to transform? What to update? When to update? How to update? How to generate the necessary metadata?
Major questions affecting the choice of the technical solution Connectivity and interoperability? Need for parallel processing ability? Scalability? Standards? Single vendor/multi-vendor Vendor stability and service? In house competency? Compatibility with existing systems architecture? Compatability with IT strategy? Functionality vs Cost?