Presentation on theme: "Configuring Global Payroll for Optimal Performance"— Presentation transcript:
1Configuring Global Payroll for Optimal Performance
2Abbey key facts 1Sixth largest bank by assets in the UKFounded in 1944Currently have approximately 18m customers741 branches across UK
3Abbey key facts 2Abbey's main offices are in London, Milton Keynes, Bradford, Glasgow and Belfast.We have around 26,000 people (full time equivalent)We have about 1.8 million shareholdersAssets at 30 June £171 billionPersonal Financial Services trading profit before tax for 6 months to 30 June £340 million
4History of PeopleSoft at Abbey PeopleSoft HRMS acquired for recruitment in 1994Implemented PeopleSoft HRMS in 1997Recruitment, Personnel & TrainingPaylink used to send data from HRMS to payrollWorkflow and self-service with v7.5 in 2000JAVA HTML ClientsPeopleSoft HRMS upgraded to 8 SP1 in 2001Implemented PeopleSoft Payroll in August 2003Project initiated to upgrade to HCM 8.8
5Current PlatformAppServer runs on SUN E4500Database runs on SUN E10000Both boxes are shared with other applicationsTier 1 mirrored disksOracle
6Why PeopleSoft Payroll ? Integrated HRMSCommon infrastructureWeb enabledAutomate administrative functionsManager Self-ServiceAbsence and maternity inputEmployee Self-ServiceOvertime inputOn-line payslipsReal-time data inputIncrease system availability
7PeopleSoft Payroll Implementation Development commenced in January 2002In-house IT Project teamProject delays due to re-scoping and internal re-structureStreamed payroll during parallel run testsWent live with payroll and absence in August 200330,000 staff and 7,000 pensioners12 streams introduced in February 2004Introduced hash partition in July 2004 due to increased run timesIdentify and calculate taking 2.5 hoursBut we had to tune it.
8If you can’t hear me say so now. ResourcesIf you can’t hear me say so now.Please feel free to ask questions as we go along.The presentation will be available fromCustomer Connection –> Conference Website
9Independent consultant Who am I?Independent consultantAbbey, DoD, Unilever, UBS…System Performance TuningOracle DatabasesUnixTuxedoPeopleSoft AppsInc. Global PayrollBook
10Who are You? Technical? Non-Technical? DBA Developer HR Functional Familiar with PeopleSoft infrastructureNon-Technical?HR FunctionalHR/P AdministratorProject Manager
11Configuring Global Payroll Physical Database ConsiderationsParallel processingIncrease concurrencyReduce ContentionReduce I/OPermit CPU usagesome Oracle specificGP ChangesEfficent GP ‘rules’Reduce CPU consumption of rules EngineData MigrationThis presentation will discuss the physical implications of using an Oracle database.The principle of read consistency is fundamental to a great deal of what Oracle does behind the scenes. Read Consistency means that the data returned by a queries is constant during the life of that query. If data being returned by a long running query is updated AND committed after that long running query starts, but before it can be fetched by the query, then the value returned will be the value as at the point in time when the query began. Physically, the before update value is reconstructed from the rollback segment. All this is also done without locking the entire table, or the entire block of data. This reconstruction process is slow and CPU as well as disk intensive. It is to be avoided.GP can process very significant quantities of data. It is important that the SQL in the identify stage executes efficiently.On the GP side it is important that the rules are written efficiently in order to minimise accesses to the PIN manager.
12This has been done for real This is not theory!This has been done for realUBS – 32,000 payeesAbbey – 36,000 payees*DoD – 640,000 payees (benchmark)Unilever – 12,500 payees (weekly & monthly)3 other installations in UK, France & JapanAnd it works!The figures in this presentation come (with permission) from Abbey National
13Payroll is calculated by a Cobol program OverviewPayroll is calculated by a Cobol programGPPDPRUNSingle non-threaded processFour StagesCancelIdentify(Re-)CalculateFinaliseThe payroll calculation is performed by a Cobol process. It is a single process. It can only execute on one CPU at any one time.If you have 10 CPUs only one will ever be consumed by a single Cobol process. The Oracle shadow process will not be active at the same time as the Cobol. To utilise more than one CPU you need to run more than one Cobol in parallel. This is termed ‘streaming’Cancel phase essentially deletes rows from the result tables that were inserted by previous calculations.The identify phase determines which employees have to be calculated. This populates two result tables GP_PYE_SEG_STAT and GP_PYE_PRC_STAT.GP_PYE_SEG_STAT has one row per employee per period per process type (calc, absence).Calculate phase is the CPU intensive part when the rule engine performs the calculation.The finalise closes off a pay calendar.
14Three stages with different behaviours CancellationMonolithic SQL to delete resultsIdentifyPopulating temporary work tablesDatabase IntensiveSQL Set processing~10-20 minutesCalculationOpening cursorsLoad data into memoryEvaluation of rules(Cobol only)Batch insert of results into databaseCobol (CPU) Intensive~6500 segments / hour / stream (was 400)The identify stages populates some temporary tables and opens a number of cursors that feed data into the calculate phase.The identify phase performs takes the information, caching some of it in memory as it goes. The results are inserting into the result tables in batches, by default, every 500 rows (but this is configurable).The identify phase is basically a series of SQL insert/update statements and is very database intensive, The Oracle shadow processesCalculation was initially 400 segs/hr/stream, tuning got it to 6500 segs. That was mainly achieve by eliminating contention.You will hear me use this concept of segments per stream per hour. Each employee has a segment to calculate. If there are changes effective in a prior pay period he will extra segments. We generally find that the average rate of calculation is very linear if we look at processing time per segment.
15Employees are split into groups defined by ranges of employee ID What is Streaming?Employees are split into groups defined by ranges of employee IDEach group/range can be processed by a different instance/stream of GPPDPRUNThe streams can then be run in parallel.Vanilla PeopleSoft functionality.This is not customisationStreaming is vanilla functionality delivered by PeopleSoft.Streams defined as ranges of employee IDs
16Why is Streaming Necessary? GPPDPRUN is a standard Cobol program.It is a single threaded processOne Cobol process can only run on one CPU at any one time36000 payees at 2700 payees /stream/hour97000 segments at 7350 segments/stream/hour49m - 1h11m - 12 streams13h12m if run in one streamOn a multi-processor server streaming enables consumption of extra CPU.
17Calculation of Stream Definitions Objective is roughly equal processing time for all streamPS_GP_PYE_SEG_STAT indicates work to be done by payroll.Calculate ranges of roughly equal numbers of rows for this tableScript using Oracle’s Analytic functions that directly populates PS_GP_STRMEqual processing time does NOT correspond to equal volumes of result data.
19Employee Distribution Creep As new employees hired EMPLIDs allocated into the same stream.That stream starts to run longer.Effective execution time is maximum execution time for all streams.Need to periodically recalculate stream rangesNeed to reflect this in physical changes.There are a number of implications of using streams
20Employee Distribution Creep Company merger/divestment.PensionersAbbey30000 employees – avg 3.03 segments per employee6000 pensioners – 1 segment per pensioner12 streamsEmployee IDs allocated sequentiallyEarlier streams richer in pensionersLater streams richer in employees
21Database Contention Rollback Contention Snapshot Too Old Insert ContentionI/O VolumeDatafile I/ORedo/Archive Log ActivityIt is not only possible, but highly likely, thatRead consistency means
22Working Storage Tables Rollback ContentionWorking Storage TablesShared by all streamsRows inserted/deleted during runDifferent Streams never create locks that block each otherDo update different rows in same block during processing1 interested transaction per stream in many blocks.There is a additional rollback overhead of 16 bytes per row if two rows in same block -v- different blocksupdates of ~<100 bytes / row
23Oracle guarantees that data is consistent throughout life of a query Read ConsistencyOracle guarantees that data is consistent throughout life of a queryIf a block has been updated by another transaction since a long running query started, it must be possible to reconstruct the state of that block at the time the query started using the rollback segment.If that information cannot be found in the rollback segment the long running query fails with ORA
24ORA Snapshot Too OldRollback segments are not extended for read consistency.Additional rollback overhead can cause rollback segments to spin and wrap.Error message also described a ‘rollback segments too small.’In this case, to simply extend the segments is the wrong response.CPU overhead to navigate rollback segment header chain
25Insert ContentionDuring the calculation phase results are written to the result tables.A number of stream can simultaneously insert into the same result tables.Increases chance that one block will contain rows relating to more than one stream.This in turn causes rollback problems during the cancel in the next calculation.
26Another cause of ORA-1555If not processing calendar for the first time, previous results cancelledResult table are deletedMonolithic deletes from each table.If Streams start together tend to delete same table at same time in each stream.A long running delete is also a query for the purposes of read consistency.It is necessary to reconstruct a block as at the time the long running delete started in order to delete a row from it.Reconstruction occurs during ‘consistent read’.Deletes by primary key columns, thus Oracle tends to look each row up row by index. Thus index reads also ‘consistent’.
27Datafile and Log Physical I/O Activity During the identify phase data is shuffled from table to tableThis generates datafile and redo log I/ORollback activity is also written to disk, undo information is also written to the redo log.All the data placed in the temporary working tables by a stream is of no use to any other instance of the calculation process.It will be deleted by a future process.Dirty blocks written to disk before the rollback segment wraps.
28High Water MarksThe working storage tables tend to be used to drive processing.Thus, the SQL tends to use full table scans.In Oracle, High Water Mark is the highest block that has ever contained data.Full Scans scan the table up to the high water mark.Temporary tables contain data for ALL streams.All streams can have to scan data for all streams.
29How to avoid inter-stream contention? Keep rows from different streams in different blocksEach block should contain rows for one and only one stream.Need Two Oracle FeaturesPartitioningGlobal Temporary Tables
30What is Partitioning? Logically Physically Local Index a partitioned table is a still a single tablePhysicallyeach partition is a separate table.in a partitioned table, the partition in which a row is placed is determined by the value of one or more columns.Local Indexis partitioned on the same logical basis as the table.
31But can also be effective in OLTP What is Partitioning?Typically used in DSSBut can also be effective in OLTP(From Oracle documentation)
32What sort of Partitioning RangeStreams defined in terms of rangesQueries specify range of employeesFits well with range partitioningEnsures partition elimination.Range Partition on EMPLIDHashPsuedo-random Hash functionSame input always gets same outputGood for single value look up.Single pay period (calendar group ID)Hash partition on CAL_RUN_ID
33How should Range Partitioning used in GP? Largest Result tables range each partitioned on EMPLID to match GP streaming1 stream : 1 partitionThus each stream references one partition in each result table.Only 1 interested transaction per blockIndexes ‘locally’ partitionedPartitioning really designed for DSS systems.Most efficient for large tables.GP_RSLT_ACUM, GP_RSLT_ERN_DED,GP_RSLT_PIN, GP_RSLT_PI_DATAEffective on smaller ones tooGP_PYE_PRC_STAT, GP_PYE_SEG_STAT
34How should Hash Partitioning used in GP? Partition by CAL_RUN_ID because SQL containsCAL_RUN_ID = …Only worthwhile on the very largestGP_RSLT_ACUM, GP_RSLT_ERN_DED, GP_RSLT_PINAdjust CAL_RUN_IDs to control partition to balance hash partition volumes.
35Predicting Hash Values Use Oracle PL/SQL functionSELECTsys.dbms_utility.get_hash_value(CAL_RUN_ID,1,16)Number of partitions should be a power of 2Due to mathematics of hash function16,32,64 not 12, 53,61, 106, 118Abbey use 32They want to hold 18 months of data, 18>16, so 32.
36Calendar Group ID Suffixes Original Calendar Group IDAN2004/10Hash value 15But partition 15 already used and 14 is least emptyAN2004/10EHash value 14Putting data into hash partition with least data improves performance.If only monthly payroll then you could arrange for one month per partition. That would make archiving easier later!
37Calendar Group ID Suffixes (i) CAL_RUN_IDX HASHVALUEXAN2004/AN2004/AN2004/AN2004/AN2004/AN2004/AN2004/AN2004/AN2004/AN2004/AN2004/AN2004/CAL_RUN_IDX HASHVALUEXAN2004/AN2004/AN2004/AN2004/AN2004/05BAN2004/06AAN2004/07BAN2004/08BAN2004/09AAN2004/10EAN2004/11BAN2004/12D
38Calendar Group ID Suffixes (ii) CAL_RUN_IDX HASHVALUEXAN2004/AN2004/AN2004/AN2004/AN2004/AN2004/AN2004/AN2004/AN2004/AN2004/AN2004/AN2004/CAL_RUN_IDX HASHVALUEXAN2004/01BEAN2004/02ALAN2004/03ATAN2004/04AJAN2004/05AFAN2004/06ACAN2004/07BCAN2004/08AIAN2004/09BJAN2004/10AWAN2004/11BRAN2004/12EB
39Partitioning on other platforms DB2 does range partitioningLatest version will do multi-dimensional range partitioningOnly Oracle does range partitioning and hash sub-partitioningmulti-dimensional range partitioning could be more effective.
40Global Temporary Tables Oracle specific feature that is appearing in other DB platforms.Definition is permanently defined in database catalogue.Physically created on demand by database in temporary tablespace for duration of session/transaction. Then dropped.Each session has its own copy of each referenced GT table.Each physical instance of each GT table only contains data for one stream.Working Storage Tables PS_GP_%_WRK converted to GT tables.
41Global Temporary Tables AdvantagesNot recoverable, therefore no Redo/Archive Loggingsome undo informationimproved performancereduce rollbackNo High Water Mark problemsSmaller object to scan.No permanent tablespace overhead.DisadvantagesDoes consume temporary tablespace but only during payrollCan’t Analyze in Oracle 8iWork aroundsCan in Oracle 9iCan hamper debuggingNew in Oracle 8.1, some bugs.GP not affected
42How many streams should be run? Cobol run on database serverEither Cobol is active or database is activeNo more than one stream per CPUPerhaps CPUs -1be careful not to starve database of CPUrun process scheduler at lower OS priorityCobol and database on different serversCobol active for 2/3 of execution time.Up to 1.5 streams per CPU on Cobol serverUp to 3 streams per CPU on database serverHotsos Profiler
43Other Streamable Processes Application EngineGP_PMT_PREPOK in CHBug in UK extensionsGP_GL_PREPGPGB_PSLIPBug fixedAdditional partitioned and GT tables required
44Abbey Production Payroll Configuration 2 nodesDatabase Node12 CPU – shared with other servicesApplication Server/Process Scheduler Node8 CPU each12 Streams2/3 of 12 is 8, so all 8 application server node CPUs active during calculate phase‘nice’ the Cobol processes (by nicing the process scheduler)1/3 of 12 is 4, so 4 of 12 DB CPUs activeimportant to leave some free CPU for database else spins escalated to sleeps generating latch contention
45Unilever Production Payroll Configuration 1 node4 CPUs each – dedicated to GP only4 Streams1 per CPUmonthly payroll only – payeesweekly payroll not streamed
46UBS Production Payroll Configuration 2 nodesDatabase NodeApplication Server/Process Scheduler Node20 CPUs each – dedicated to HR&GP30 Streams2/3 of 30 is 20, so all 20 application server node CPUs active during calculate phase1/3 of 30 is 10, so 10 of 20 CPUs active
48GP Development GoalsHow to create and test efficient rules that work without adversely effecting performanceHow best to identify problems particularly in the area of system setup/data versus a problem in a rule or underlying programHow to use GP payroll debugging tools
49Efficient RulesResponsible for two thirds of the execution time, and so could produce the greatest saving, it will also require the greatest effort.Detailed functional and technical analysis of the definition of the payroll rules.The process involves detailed functional and technical analysis of the definition of the payroll rules. While this is responsible for two thirds of the execution time, and so could produce the greatest saving, it will also require the greatest effort.The tuning of rules can be as simple as using literals instead of variable elements and as complex as redesigning them. The process ideally starts during the design stage when various implementation schemes are analysed, intermediate tests are performed and the most efficient scheme is chosen.Likewise, all aspects of Global Payroll must be considered since creating rules to simplify calculation can adversely affect reporting or other online and batch areas and vice versa.This is an on-going process that does not stop with the rule’s implementation since the change in the size of employee population, number of records on the underlying tables, etc. can produce an unexpectedly substandard result for an initially efficient rule.
50Efficient RulesThe process ideally starts during the design stage when various implementation schemes are analysed, intermediate tests are performed and the most efficient scheme is chosen.All aspects of Global Payroll must be considered since creating rules to simplify calculation can adversely affect reporting or other online and batch areas and vice versa.The rules can be broken down into two groups. PeopleSoft delivered rules, and customer developments. So far, most of the tuning effort has focused on the rules delivered by PeopleSoft.The choice of rules to be examined has been determined by running payroll for a small subset of employees with auditing enabled. From this we can determine how much time has been spent in each rule. Then we examine the rules that take the greatest time.In principle it should be perfectly possible for PeopleSoft to tune the rules that they have delivered. However, different sets of data will exercise the rules to different degrees. Thus, if PeopleSoft use their own data set they may choose to tune different rules.
51Generation control versus conditional section Efficient RulesArraysRe-calculate?Store / Don’t storeFormulasProration and CountHistorical rulesGeneration control versus conditional sectionRe-calculation = Yes or No?It’s important to be careful when you are using this functionality. In fact each time you are using an element with “re-calc” = Yes, the process will call the program to resolve it. Set this switch to 'No' unless you are sure that you want a recalculation.Store / Do Not store?You only need to store elements if you want to use them in a historical rule, if you need them for retro, reporting or auditing purposes. If you need certain supporting elements for reporting or audit, it might be better to create a Write Array that writes a row with all of the necessary results.Store if zero?If you decided to store an element, do you want to store it if its value is zero or blank? Definitely do not store accumulators if they are zero.Formulas:1. Use literals like 'Y' or 'N' instead of variables. For 56 employees and 10 formulas, the difference in processing with variables vs. literals was close to 700%.2. Use Exit in nested IF.3. When you have multiple conditions, put the most 'popular' at the top, followed by second most 'popular', etc.4. Use Min/Max.Arrays:The most important thing is to reduce the number of times you call the lookup formula.Proration and Count:When you need to have multiple proration rules as Calendar days, workdays and work hours for the same slice periods, it’s better to have one count element to “resolve” all proration rules. The goal is to minimize the “reading” of works schedule.Generation control versus conditional section:If a conditional formula resolves to 0, all elements in that section are skipped. That means that some Positive Input records and adjustments may remained unprocessed. However, it’s much better for the performance to use a conditional section.
52Efficient RulesKeyed by Employee - 1 select, multiple fetches, small result set to searchUser Defined - 1 select, multiple fetches, all searches in memory.User Defined with the Reload Option - multiple selects, multiple fetches, small result set to search.
62Migration/Customization PI v. ArrayPI can be used during identification.PI has special considerations during eligibility checking.PI allows easy override of components on element definition such as Unit, Rate, Percent or Base.The Array cannot handle multiple instances of earning/deduction.PI vs. Array approach.Using Arrays to drive payroll calculations is a very complicated process. Some functionality that is available to PI cannot be duplicated any other way, including by using arrays.The following are some of the major advantages of using PI (in no particular order):1) PI can be used during identification. Since arrays are available in the calculation step only, the customer will have to come up with a User Exit to add appropriate payees to the process. The User Exit be smart enough to do Cancel logic as well (or create another User Exit) in order to cancel employees out of Calculation if Data removed from the table. PeopleSoft does not encourage the use of user exists. They should be considered the last resort.2) PI has special considerations during eligibility checking. It supercedes any Payee Override information. If there is a PI for element that is not in eligibility group but in the process list, it will be processed if the PI override switch on the Pay Entity is on. This functionality is completely outside of Array abilities.3) PI allows easy override of components on element definition such as Unit, Rate, Percent or Base. In the absence of PI, the default process ensues. In other words, a unit can be defined as a formula, amount, bracket, etc. If there is no PI, the process will resolve that formula, bracket, etc. If there is a PI, it overrides the calculation for that payee and calendar. The arrays can only populate variables, so either an earning/deduction component that is fed by the array must be defined as variable or the whole logic must be duplicated in some formulae. Most likely such formulae will have to be created for each earning/deduction component. So let's see… the number of elements * the number of components.4) The Array cannot handle multiple instances of earning/deduction. The table that is read by Array must have the sum of all instances for a payee/element/period.5) There is no way to override prorate option on earning/deduction definition using Arrays.6) PI also overrides Generation control. So if generation control says not to process an earnings or deduction but PI exists, the earnings or deduction will be processed if PI override checkbox on Paying Entity is on. Cannot be done with Array.7) PI is automatically directed to a proper segment/slice based upon the begin / end PIdates. A special non-trivial process (?) must be devised to enable use of Arrays during segmentation/slicing event. I have to spend some time thinking about this process. At this time, I can't imagine what it might look like.8) Using Array precludes the customer from using an element on multiple calendars without an additional non-GP procedure (SQR?) to mark processed instances.9) The RATE AS OF DATE will have to be somehow controlled for every Rate Code element since it may be different for various earnings and deductions. PI provides an easy way of doing so per each instance of an element.10) PI allows the customer to override rate code, rounding rule, currency, etc. for each instance of earnings/deductions. There is no easy way to duplicate this using Arrays.11) The provisions must be made to resolve conflict between a PI instance generated from Absence calculation and data read by Array for the same element. Which one wins? Can't be both. This is not a problem when using PI approach.12) GP automatically keeps track of all the changes to PI over time. This is not only allows for a proper calculation but makes researching the changes a snap. The customer will have to create a process to duplicate this functionality.13) Using PI allows overriding a GL cost center for a specific instance of an element. During GL calculation, the process looks through PI tables to get these overrides. This functionality cannot be duplicated by the Array approach. The customer will have to create a process to duplicate this functionality.14) This same concept also applies to User Keys on Accumulators. If PI SOVRs are used, each instance of PI will update the appropriate Accumulator Instance (based on User Keys).15) The issue of Retro, Segmentation or Iterative triggers is the same for either approach but for PI can be solved with the use of Component Interface.
63Debugging Tools Audit Trace Trace All Trace Errors Large number of records, potential rollback segment size problemsView on-lineQuery with SQLHotsos Profiler
70Efficient rule section written by AcknowledgementsEfficient rule section written byGene PirogovskyOmnia Solutions Inc.
71This permits use of streaming to utilise all available CPUs. ConclusionUse of Partitioning and Global Temporary Tables almost completely eliminates inter-stream contention.Almost 100% scalability – until I/O subsystem becomes bottleneck.This permits use of streaming to utilise all available CPUs.GP will always be a CPU bound processRule Tuning will reduce CPU overheadIt is an on-going process