Presentation on theme: "Best Practice Model Customisation and ETL for Sybase IWS – Instant IWS"— Presentation transcript:
1 Best Practice Model Customisation and ETL for Sybase IWS – Instant IWS 5/3/ IntroductionPerforming source to target mappings, customising the Sybase IWS model and then writing the ETL to load the IWS model still requires a significant amount of time and effort.Over the last 4 years Peter Nolan has been investigating ways and means to make the process of implementing Sybase IWS faster and cheaper. This paper briefly documents these ways and means.ETL TemplatesIn 2002 Peter investigated the possibility of creating automatically customisable templates in Informatica and DataStage.The findings were as follows:It is possible to write an application that could read a source to target mapping document such as a spreadsheet and then generate the XML required to create a customised job for DS/INFA to load the table defined in the source to target mapping.However, the time and effort involved in writing such an application would be very significant as the XML is very complicated. There would be little value in creating such complicated code as if it was successful the ETL vendors could simply copy the idea and sell the product leaving no revenue for such a tool.As a result of this Peter decided to give these templates to Sybase on the basis that they would be provided ‘as is’ free of charge to any Sybase customer Sybase chose to give them to. There are approximately 160 pages of documentation as well as the templates themselves. They implement current best practice methods of Informatica and DataStage processing when loading IWS.SeETL for IWSSeETL is a platform independent ETL tool which is 10x more productive than DataStage or Informatica.It is so productive that it is possible to prototype an IWS implementation using SeETL for IWS and then re-write the tested prototype code and still cut work months off the overall implementation!!On top of SeETL we have developed modules for IWS for Telco and Finance.These modules mean that it is now possible to load the IWS model ‘as is’ with no extra effort today. All that is required is to make the source data available via an ODBC driver. The source data could even be in files.However, IWS is never implemented ‘as is’. It is always customised. And the remainder of this paper is about customisation of the IWS model.A White Paper by Instant Business Intelligence
2 Best Practice Model Customisation and ETL for Sybase IWS – Instant IWS Developing IWS Mappings and CustomisationToday the ‘Best Practice’ for developing IWS mappings is as follows:Acquire the free SeETLRT Utilities Package from Instant BI. This package includes a ‘Data Transfer Utility’ (DTU) that can be used to load the prototype staging area.Develop a staging area in the target RDBMS which contains staging tables for all data that will flow into the IWS instance. (The restriction being that if the volume is too large to be run through a table the staging area for these very large files should just be files. However, today, most data flowing into the IWS should be staged in an RDBMS.)Use the DTU to populate the staging area with test data. The DTU contains features to allow you to default fields such as ‘valid row ind’, and ‘sent to IWS ind’.Once all data that should move through the staging area has had some staging area created and some data placed into it, no matter how small an amount, begin the mapping process.To do this we use a spreadsheet (a copy is available from Sybase or direct from Peter Nolan.) The spreadsheet is set up in a specific format.You load the table definitions of the entire staging area into the left hand side of the spreadsheet.You then proceed to type in the target IWS table and column into the RHS of the spreadsheet making detailed notes on any transformation required to the column on the way to the IWS database.In the past we have been mapping at the physical level but we have discovered recently that we would be better off mapping at a ‘logical’ level.The spreadsheet is drillable and mappings can be printed on an input table or target view basis.Review each mapping as it is completed.When a mapping is complete the changes that it requires to the target IWS model can be applied.When all the mappings are complete and all the changes required have been applied to the IWS model then generation of ETL can begin.If you are using SeETL for IWS the current ‘Best Practice’ is to cut/past the mappings from the spreadsheet on a target table by target table basis and create the input/update views by hand in a text editor like textpad.You should expect to be able to map even a large IWS implementation in 2-3 weeks.A White Paper by Instant Business Intelligence
3 Best Practice Model Customisation and ETL for Sybase IWS – Instant IWS We are working on another tool that will take the mapping spreadsheet and generate all the SeETL for IWS views and control files required at the push of a button…but we are not there yet.Having built the entire SeETL for IWS ETL you can run the ETL for the data in the staging area to see how well the mappings that have been defined work. We recommend that significant testing is performed to find errors in data understanding that has led to mappings being defined in error. SeETL for IWS is very handy to find data mapped improperly because it actually puts the data into the IWS.We now recommend that ‘Presentation Views’ are used for ALL tables to insulate the underlying IWS database from being accessed directly by any tools including SeETL for IWS.In a future version of the spreadsheet we plan to include the presentation views and we also plan to generate them.Once all ETL is tested what you do next depends on what you have chosen to do on the project.If you have chosen to go live with SeETL for IWS you need to move the prototype to the real target platform (it is assumed you plan to prototype in win2000.) It is supported on AIX/Solaris/Win2000. HP-UX will be added according to demand. There are no plans to support Linux.You will then start to scale up your testing by loading larger volumes into the staging area as well as larger volumes into the IWS database itself. SeETL does not cost more money for more processors so the speed of the batch is really determined by how much the client is willing to spend on processors. You may also choose to use the scheduler provided.If you plan to use DataStage as your ETL tool we recommend you do the following.Develop naming standards for parameters passed to jobs.Develop your mechanism for running jobs. The mechanism used by Instant BI is proprietary and is only available to projects where IBI staff are heavily involved in DataStage job development. These tools were written by an IBI partner and this is a condition of their continued use by IBI.As a ‘public’ and ‘open source’ solution IBI provides a full scale scheduler and a DataStage jobs submission facility that sets parameters from DataStage jobs from within a view held inside the IWS database. It works perfectly well.Develop your templates for loading each type of table based on the DataStage templates provided.Test you templates extensively as any errors made in the template will be propagated to all jobs.A White Paper by Instant Business Intelligence
4 Best Practice Model Customisation and ETL for Sybase IWS – Instant IWS For each final job that is required you can unload the template into XML. Edit the template. Then reload it into DataStage for further editing.Of course, you must do large volumes of testing.Using these techniques we have been able to drastically reduce the work days for implementation. The following stats come from a recent project.Source System: Oracle Applications 11Number of tables extracted: 100+Number of fields extracted: 9,000+Tables/fields in Staging Area: 100+/9,000Number of fields moved to IWS: 3,100Number Logical Dimension tables: 55+Number Logical Fact tables: 30+Mapping and IS model customisation: 8 work weeksSeETL For IWS Implementation: 2 work weeksTesting prototype ETL: 1 work weekSetting up DS Envt: 2 work weeksCustomising DS ETL Templates: 4 work weeksWriting DS ETL for the 85+ logical views: 4 work weeksTesting for productionisation: 2 work weeksThese numbers are more than twice as fast as Peters previous best effort and more than 4 times as fast as the effort before that. Names of the clients are available from Jonathan Simmons.A word of warning. Just having these tools does not mean the staff on the project can implement the back end of a complex IWS implementation this quickly. Peter has been doing ETL based work on a regular basis for the last 14 years and worked on large batch systems prior to that. Writing ETL quickly and being able to test it quickly is a skill that gets better with time.However, even staff with modest skills in tools like DataStage will be much more productive by using the templates.Further, there are no specific extra skills required to use SeETL above basic DBA skills. SeETL is simply executable code sitting on top of views. Any DBA/IT person who can read a manual and create tables and views can build ETL in SeETL.SummaryIWS is a sophisticated model. This sophistication is ‘paid for’ by increased complexity of ETL.Instant BI has developed tools and techniques to reduce the amount of effort required to implement IWS on any database on win2000/AIX/Solaris. To the best of our knowledge, the speed with which these tools can be used to deploy Sybase IWS represent current ‘Best Practice’.A White Paper by Instant Business Intelligence
Your consent to our cookies if you continue to use this website.