Download presentation
Presentation is loading. Please wait.
Published byJohn Lucas Modified over 9 years ago
1
Turning DBQL Data into Actionable Information Greg Hamilton PS Sr. Consultant TD - Teradata Performance and Workload Management Center of Expertise March 19th, 2009
2
2 Database Query Log (DBQL) Basics DBQL Recommendations DBQL and TD 12 & 13 Actionable DBQL >Application Performance >Workload Management Bonus Queries Overview
3
DBQL Basics
4
4 DBQL - Tables DBQLogTbl For individual queries: query origination start, stop and other timings error codes SQL (truncated) CPU AND I/O values DBQLStepTbl query step timings, row counts, CPU and I/O, step actions, etc DBQLObjTbl Tracks usage of database objects (tables, columns, indexes, database,etc) DBQLSqlTbl Holds full SQL text DBQLSummaryTbl Summary of (short-running) queries: high-volume queries: query origination response time, cpu and io sums, query counts DBQLExplainTbl Holds full EXPLAIN text
5
5 Detail Table - DBQLogTbl NumResultRows TotalIOCount AMPCPUTime ParserCPUTime UtilityByteCount UtilityRowCount ErrorCode ErrorText WarningOnly DelayTime AbortFlag CacheFlag StatementType QueryText NumOfActiveAMPs MaxAMPCPUTime MaxCPUAmpNumber MinAmpCPUTime MaxAmpIO MaxIOAmpNumber MinAmpIO SpoolUsage WDID OpEnvID SysConID LSN INTEGER NoClassification ProcID CollectTimeStamp QueryID UserID AcctString ExpandAcctString SessionID LogicalHostID RequestNum InternalRequestNum LogonDateTime AcctStringTime AcctStringHour AcctStringDate LogonSource AppID ClientID ClientAddr QueryBand ProfileID StartTime FirstStepTime FirstRespTime LastStateChange NumSteps NumStepswPar MaxStepsInPar WDOverride SLGMet ExceptionValue FinalWDID TDWMEstMaxRows TDWMEstLastRows TDWMEstTotalTime TDWMAllAmpFlag TDWMConfLevelUsed TDWMRuleID UserName DefaultDatabase AMPCPUTimeNorm ParserCPUTimeNorm MaxAMPCPUTimeNorm MaxCPUAmpNumberNorm MinAmpCPUTimeNorm EstResultRows EstProcTime EstMaxRowCount ExtraField1 ExtraField2 ExtraField3 ExtraField4 ExtraField5 ExtraField6 ExtraField7 Procid and Queryid: Ties all DBQL tables together except summary
6
6 One Row or more are written to DBC.DBQLSQLTBL for each SQL statement Large statements can cause multiple rows to be written in order to log the full query text. SQL Table - DBQLSqlTbl BEGIN QUERY LOGGING WITH SQL, OBJECTS, STEPinfo, EXPLAIN limit SQLTEXT=0 ON ALL ACCOUNT = ‘$H1$WG1R&S&D&H’; Select * From DBC.DBQLSQLTBL;
7
7 Object Table - DBQLObjTbl DBQL logs one row in DBQLObjTbl for each physical data object used by the optimizer to satisfy the query. An object can be: BEGIN QUERY LOGGING WITH SQL, OBJECTS, STEPinfo, EXPLAIN limit SQLTEXT=0 ON ALL ACCOUNT = ‘$H1$WG1R&S&D&H’;
8
8 One row per step is written to DBC.DBQLStepTbl Info: Step number, step name, start time, end time, row count Diagnostic in nature, shouldn’t be left on StepInfo Table - DBQLStepTbl BEGIN QUERY LOGGING WITH OBJECTS, SQL, StepInfo, Explain ON DBC; END QUERY LOGGING ON DBC; Select * from DBC.DBQLStepTbl; * ExtraField1 = RowsWComprColumns Available in V2R6.1 *
9
9 One row (or more) is written to DBC.DBQLExplainTbl Large explains can cause multiple rows to be written in order to log the full explain text. Diagnostic in nature, shouldn’t be left on Explain Table - DBQLExplainTbl BEGIN QUERY LOGGING WITH OBJECTS, SQL, StepInfo, Explain ON DBC; END QUERY LOGGING ON DBC; Select * from DBQLExplainTbl;
10
Recommendations for Logging and Maintenance
11
11 Logging Recommendations ALL usage should be logged in DBQL 24 X 7 Flush the DBQL Caches at default rate of 10 minutes Recommend logging usage in DBQL at the AccountString level >This may mean some work needs to be done to the AccountStrings to ensure the appropriate level of logging is done for each type of usage >Our recommendation for AccountString setup: $H1$WXYZ&S&D&H Where… –$H1$ = The Performance Group –WXYZ = The Work Group Identifier –&S&D&H = ASE Variables – Session, Date, and Hour
12
12 Logging Recommendations Three types of logging Classified by usage >Type 1 – Long running (Adhoc, Report Apps, Batch Loads) –Example: BEGIN QUERY LOGGING WITH SQL, OBJECTS LIMIT SQLTEXT=0 ON ALL ACCOUNT = ‘acctname’; –TD13 default log all queries as Type 1 –Example: BEGIN QUERY LOGGING WITH SQL, OBJECTS LIMIT SQLTEXT=0 ON ALL; >Type 2 - ONLY short running (subsecond) KNOWN work (TPumps, Apps w/10000s of Tactical Queries ) –Example: BEGIN QUERY LOGGING LIMIT SUMMARY=10,50,100 CPUTIME ON ALL ACCOUNT = ‘acctname’; (V2R6) –TD13 Type2 logged same as before >Type 3 - Mostly short (subsecond) known work with occasional long running unknown work (Tactical Apps with Some Larger or Unknown Queries) –Example: BEGIN QUERY LOGGING LIMIT THRESHOLD=100 CPUTIME AND SQLTEXT=10000 ON ALL ACCOUNT = ‘acctname’; (V2R6) –TD13 Type3 logged same as before
13
13 Recommendations for Historic DBQL Data Our recommendation is to remove the DBQL data from the DBC database nightly If space is a concern consider changing retention Recommendation for DDL on Historical DBQL >Create new date column populated from Starttime on all tables >Put PPI on this date to provide better performing analysis >Join between DBQL tables on ProcId, QueryId and the Date >End logging on at least one user to dump the cache to ensure all DBQL history is captured >Begin logging on the user again >Retain 13 months Detail & Summary, 3 months SQL & Object >Keep history tables similar to DBC DBQL tables for other tools like Teradata Workload Analyzer NOTE: The standard maintenance process (for V2R6 and TD12) is now available on Teradata.com
14
DBQL TD 12 & 13
15
15 What’s in V12 Column Name Changes in DBQLogTBL >Extrafield1 to InternalRequestNum >TotalCPUTime to AmpCPUTime (Linux is recorded in Milliseconds, Unix is recorded in hundredths) >Extrafield5 to StatementType >HotAmp1CPU to MaxAmpCPUTime >HotCPUAmpNumber to MaxCPUAmpNumber >LowAmp1CPU to MinAmpCPUTime >HotAmp1IO to MaxAmpIO >HotIOAmpNumber to MaxIOAmpNumber >LowAmp1IO to MinAmpIO >WDPeriodID to OPEnvID >FirstRespTime will not be null and LastRespTime column is removed. New Columns in DBQLOGTBL >LogonSource >LastStateChange >ParserCPUTime
16
16 What’s in V12 New Columns in DBQLogTBL (Contd) >UtilityByteCount (not populated yet) >UtilityRowCount (only populated on the End Loading Statement entry) >SysConId >TDWMRuleID >AmpCPUTimeNorm >ParserCPUTimeNorm >MaxAmpCPUTimeNorm >MaxCPUAmpNumberNorm >MinAmpCPUTimeNorm >EstResultRows >EstProcTime >EstMaxRowCount
17
17 What’s in V12 Column Name Changes in DBQLStepTBL >HotAmp1CPU to MaxAmpCPUTime >HotCPUAmpNumber to MaxCPUAmpNumber >LowAmp1CPU to MinAmpCPUTime >HotAmp1IO to MaxdAmpIO >HotIOAmpNumber to MaxIOAmpNumber >LowAmp1IO to MinAmpIO New Columns in DBQLStepTBL >EstCPUCost >RowCount2 >SpoolUsage >MaxAmpSpool >MaxSpoolAmpNumber >MinAmpSpool >StepWD >LSN
18
18 What’s in V12 New Columns in DBQLStepTBL >UtilityTableId >EstIOCost >EstNetCost >EstHrCost >CPUTimeNorm >MaxAmpCPUTimeNorm >MaxCPUAmpNumberNorm >MinAmpCPUTimeNorm Columns Changes in DBQLSummaryTBL (Contd) >TotalCPUTimeAmpCPUTime New Columns in DBQLSummaryTBL (Contd) >ParserCPUTime >AmpCPUTimeNorm >ParserCPUTimeNorm
19
19 TD13 DBQL Enhancement Other types of logging allowed when logging all turned on New SQL to add a WITH NONE rule to inhibit applying lower hierarchy rules for a user, user/account or group of users Log by application name or disable logging for an application. Rules have a hierarchy with a “Best Fit Rule” policy. Rule Type - Hierarchy 1A rule based on application name 2A rule for this user and the specific account 3A rule for this user and any account 4A rule for all users and the specific account 5A rule for all users and any account
20
Actionable DBQL
21
21 Application Performance Use DBQLogTBL to Identify Problem Queries >Fields to focus on –CPU –IO –Spool Usage (Combined with user spool info to proactively identify users close to spool limit) –ErrorCode –Response Time (Identify possible blocking or contention on resources) –Returned rows Response Time Calculation: EXTRACT (SECOND FROM lastresptime) + (EXTRACT (MINUTE FROM lastresptime) * 60 ) + (EXTRACT (HOUR FROM lastresptime) *60*60 ) + (86400 * (CAST ( lastresptime AS DATE) - CAST ( starttime AS DATE) ) ) - (EXTRACT (SECOND FROM starttime) + (EXTRACT (MINUTE FROM starttime) * 60 ) + (EXTRACT (HOUR FROM starttime) *60*60 ) )
22
22 Application Performance Use DBQLogTBL (Contd) >Suspect Query Report –Ranked on Impact CPU “HotAMP1CPU * (HASHAMP()+1)” –Calculates CPU & IO skew, Product Join Indicator and Unnecessary IO Indicator –Filters out queries that don’t have a certain amount of skew, PJI or UII –UII & PJI – minimum depends on hardware version and/or operating system –Large PJI indicates possible product join –Large UII indicates opportunity for reducing IO PJI Calculation: CASE WHEN totalcputime < 1 OR TotalIOCount = 0 THEN 0 ELSE (a.TotalCPUTime *1000)/a.TotalIOCount UII Calculation:CASE WHEN totalcputime < 1 OR TotalCPUTime = 0 THEN 0 ELSE a.TotalIOCount/(a.TotalCPUTime *1000)
23
23 Sample Suspect Query Report
24
24 Suspect Query Report by Process Suspect query report can be modified for different purposes >Application >Time of day >Process
25
25 Suspect Query Rpt Post Fixes Validate success Iterative process
26
26 Application Performance Use DBQLObjectTBL >Use object table before dropping or altering any DDL to identify use, users and/or times >Use to correlate suspect queries to tables involved >Object tables and index use report –Can be used after creating an index or as a recurring report –Save space and CPU used to maintain under utilized indexes and tables, resulting in reduced batch times –Exclude collect stats and maintenance requests on base table Notes : Prior to V2R6.1 – If the object is an index, its number is logged rather than a name. In V2R6.1 - There will be one row in the table for each column that makes up the index, and the ColumnNames will be logged (one row for each component column)
27
27 Sample Suspect Tables Report
28
28 Application Performance Use DBQLStepTBL >Use to help diagnose problem queries –Each step contains start and finish times, resources used and skewing if present for each step –Look at differences between estimated row count and actual for each step –Example: Can show blocking (not what was blocked) –Example: All amp operation taking long time even though not using much resources may indicate AWT shortage –Example: Query taking long in aggregation step may indicate aggregate join index will help >Use to identify plan changes –Can be done by # of steps or type of steps –Should be used in conjunction with DBQLExplainTBL
29
29 Application Performance Use DBQLExplainTBL >Use to identify plan changes –Good for any changes to system (upgrades, DDL changes, etc…) –Can scan text for specific types of joins –Example: Site turns on merge joins, with explain turned on site is able to identify queries that were effected –When used with DBQLStepTBL can provide more in depth analysis
30
30 Workload Management Workload Management Example
31
31 Workload Management Workload Management Example (Contd) >Previous graph shows large increase in number of queries running longer than one second at 3 AM >Additional analysis required –Schmon, Resusage or DBQL can identify CPU, IO or skewing issues –If collecting Amp Worker Task information concurrency issues could be determined –DBQL can show if non-tactical work is being done by the workload that is missing its SLAs
32
32 Workload Management Workload Management Example (Contd) >Assume Previous example due to AWT shortage >Need to limit work, great opportunity to use DBQL concurrency report
33
33 Workload Management Determine Success After Changes to Workload management >In TASM rule changes should be quantifiable and DBQL is just the data to do it >Graph below shows example of Workload before and after changes to system throttle
34
34 Workload Management Use DBQL with DBC.TDWMexceptionLog >Identify queries that are exceptions and possible reasons >Identify time periods that might require Workload adjustments
35
35 CPU Histogram – Bucket Query Shows number of queries, by account string or WDID, based on the amount of CPU used >97.9% use < 1/2 CPU second >99.6% use < 100 CPU seconds >100% use < 200 CPU seconds >10% of “Tactical” queries used > 1 CPU second >TAC used > 64% of CPU used Useful to help set up TASM >Indicates what the customer thinks is tactical, isn’t always >Giving a very high priority for queries estimated to use < 1/2 second should provide the majority of queries a better opportunity for a good response time >Medium priority < 100 CPU seconds >Queries > 300 seconds are –Candidates for being terminated as “bad queries” –Exceptions
36
36 Example CPU Histogram Chart
37
37 Response Time Histogram Average response time for each account string and/or workload >Broken down by the amount of CPU used >Classify by CPU buckets that are site appropriate Queries that use < 1 CPU second have unacceptable response times >Evident in other areas also Useful information to help set up and analyze TASM >Giving a very high priority for Queries estimated to use less than 1 second should provide the majority of queries a better opportunity for a good response time
38
38 Example Response Time Histogram Red areas represent potential areas for response time improvement >Higher priority >Reclassification >Other action
39
39 gh185035@teradata.com ®
40
Bonus Queries
41
41 DBC Suspect Query Macro REPLACE MACRO PMCPINFO.SuspQuerysDBC_Rpt ( topN INT) AS ( /* This Macro can be run anytime to list Poor Performing Querys */ /* That are in the DBC DBQL tables. It assumes that only the current */ /* day is in the DBC DBQL tables. The parameter requires the number */ /* of queries you want to list. If you put in 50, you will get the worst 50. */ SELECT RANK(ImpactCPU) AS CPURank,UserName, AcctString, ExpandAcctString, appid, clientid, DefaultDatabase, SpoolUsage, StartTime, ParseBlockTime, QryRespTime, NumOfActiveAmps, NumResultRows, SumCPU, SumIO, CpuSkw, IOSkw, PJI, UII, ImpactCPU,CASE WHEN s.sqltextinfo IS NOT NULL OR s.sqltextinfo = ' ' THEN s.SQLTextInfo ||';' ELSE a.querytext||';' END AS QryText FROM ( SELECT a.UserName, a.ProcId, a.QueryId, a.expandacctstring, appid, clientid, CAST(EXTRACT(HOUR FROM ((a.firststeptime - a.StartTime) HOUR(2) TO SECOND(2) ) ) * 3600 + EXTRACT(MINUTE FROM ((a.firststeptime - a.StartTime) HOUR(2) TO SECOND(2) ) ) * 60 + EXTRACT(SECOND FROM ((a.firststeptime - a.StartTime) HOUR(2) TO SECOND(2) ) ) AS INTEGER) AS ParseBlockTime,((lastresptime-Starttime) DAY(4) TO SECOND) AS QryRespTime, NumofActiveAmps, NumResultRows, a.TotalCPUTime AS SumCPU, a.TotalIOCount AS SumIO, CASE WHEN totalcputime < 1 OR (TotalCPUTime / (HASHAMP()+1)) =0 THEN 0 ELSE HotAmp1CPU/(TotalCPUTime / (HASHAMP()+1)) END (DEC(8,2)) AS CPUSKW, CASE WHEN totalcputime < 1 OR (TotalIOCount / (HASHAMP()+1)) =0 THEN 0 ELSE HotAmp1IO/(TotalIOCount / (HASHAMP()+1)) END (DEC(8,2)) AS IOSKW, CASE WHEN totalcputime < 1 OR TotalIOCount = 0 THEN 0 ELSE (a.TotalCPUTime *1000)/a.TotalIOCount END AS PJI, CASE WHEN totalcputime < 1 OR TotalCPUTime = 0 THEN 0 ELSE a.TotalIOCount/(a.TotalCPUTime *1000) END AS UII, a.HotAMP1CPU * (HASHAMP()+1) /*no. of Amps*/ AS ImpactCPU, a.acctstring, a.Starttime, a.querytext, a.logdate, a.defaultdatabase, a.SpoolUsage FROM PMCPINFO.DBQLogTbl_DBC a WHERE ( CpuSkw > 1.25 /*2.0*/ OR IOSkw > 1.25 /*2.0*/ /* Start with 2.0 as a threshold for CoExistence systems */ OR UII > 6 /* UnNecessary IO Indicator - may offer an opportunity for Indexing */ OR pji > 6 /* Could use a lower threshold (3) for Older Node systems */ ) GROUP BY 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23 ) AS a LEFT OUTER JOIN (SELECT Procid, Queryid, SUBSTR(sqltextinfo,1,15000) AS sqltextinfo FROM PMCPINFO.DBQLSQLTbl_DBC s WHERE sqlrowno = 1) AS s ON s.ProcID = a.ProcID AND s.QueryID = a.QueryID QUALIFY RANK(ImpactCPU) <= :TOPN ORDER BY 1; );
42
42 TDWM Exceptions joined to DBQL SQL SELECT DL.logdate,DL.starttime, DL.queryid, DL.Username, DL.expandacctstring, EX.WDName, EX.WDPeriod, EX.ClassificationTime, EX.ExceptionTime, EX.NewWDName, SumCPU, SumIO, CpuSkw, IOSkw, PJI, UII, ImpactCPU, trim(EX.ErrorText) as ExceptionText, Case When dl.sqltextinfo is not null or dl.sqltextinfo = ' ' Then dl.SQLTextInfo||';' Else dl.querytext||';' End as QryText FROM (select cast (a.starttime as date) as logdate, a.procid, a.queryid, a.Username, a.AcctString, a.expandacctstring, ((firstresptime-starttime) day(4) to second) as QryRespTime, NumofActiveAmps, NumResultRows, a.totalcputime as SumCPU, a.TotalIOCount as SumIO, Case when (totalcputime / (hashamp()+1) ) =0 then 0 else HotAmp1CPU /(TotalCPUTime / (hashamp()+1) ) End (dec(7,2)) as CpuSkw, Case when (TotalIOCount / (hashamp()+1) ) =0 then 0 else HotAmp1IO /(TotalIOCount / (hashamp()+1) ) End (dec(7,2)) as IOSkw, Case when a.TotalIOCount = 0 then 0 else (a.totalcputime *1000)/a.TotalIOCount end (dec(7,2)) as PJI, Case when a.totalcputime = 0 then 0 else a.TotalIOCount /(a.totalcputime *1000) end (dec(7,2)) as UII, a.HotAmp1CPU * (hashamp()+1) as ImpactCPU, a.acctstring, a.starttime,a.querytext, s.sqltextinfo, a.defaultdatabase, a.SpoolUsage from dbc.dbqlogtbl a left outer join (Select Procid, Queryid,cast(collecttimestamp as date) as logdate, substr(sqltextinfo,1,15000) as sqltextinfo from dbc.DBQLSQLTbl s WHERE cast (collecttimestamp as date) between date and date and sqlrowno = 1) as s on s.ProcID = a.ProcID and s.QueryID = a.QueryID and s.logdate = cast(a.starttime as date)) DL, (Select cast(Bufferts as date) as Logdate, ProcID, QueryId, BufferTS, UserName, el.WDID, wl.WLCName as WDName, wp.Description as WDPeriod, el.WDPeriodID, ClassificationTime, ExceptionTime, wd.WLCName as NewWDName, NewWDID, ErrorText, ExtraInfo from dbc.TDWMExceptionLog el left outer join tdwm.wlcdefs wl on wdid=wl.wlcid left outer join tdwm.wlcdefs wd on NewWDID=wd.wlcid left outer join tdwm.WlcPeriods wp on WDPeriodID=wp.PeriodId and WDID=wp.WLCID where Cast(BufferTS as Date) between date and date and extract(hour from bufferts) > 07 and wd.removedate=0 and wl.removedate=0 and wp.removedate=0) ex where ex.Logdate=dl.Logdate and ex.Procid=DL.procid and ex.queryid=DL.queryid and qrytext not like '%COLLECT%' order by 1, 2,5;
43
43 Suspect Table Report LOCKING ROW FOR ACCESS SEL o.ObjectDatabaseName,o.ObjectTableName,SUM(NumOfQrys) AS TotNumOfQrys,SUM(sumcpu) AS TotCPUConsumed,SUM(CASE WHEN (cpuskw > 1.25 /*2.0*/ OR ioskw > 1.25 /*2.0*/ OR PJI > 6 OR UII >6 ) THEN sumcpu ELSE 0 END) AS TotSkdPJLSQrys,SUM(CASE WHEN (cpuskw > 1.25 /*2.0*/ OR ioskw > 1.25 /*2.0*/ OR PJI > 6 OR UII >6 ) THEN sumcpu ELSE 0 END) AS TotSkdPJLSCPU,SUM(CASE WHEN (cpuskw > 1.25 /*2.0*/ OR ioskw > 1.25 /*2.0*/ ) THEN sumcpu ELSE 0 END) (FLOAT) AS SkewedCPU,SUM(CASE WHEN (cpuskw > 1.25 /*2.0*/ OR ioskw > 1.25 /*2.0*/ ) THEN 1 ELSE 0 END) (FLOAT) AS SkewedNumQrys,SUM(CASE WHEN (PJI > 6) THEN sumcpu ELSE 0 END) (FLOAT) AS ProdJnCPU,SUM(CASE WHEN (PJI > 6) THEN 1 ELSE 0 END) (FLOAT) AS ProdJnNumQrys,SUM(CASE WHEN (UII > 6) THEN sumcpu ELSE 0 END) (FLOAT) AS LgScansCPU,SUM(CASE WHEN (UII > 6) THEN 1 ELSE 0 END) (FLOAT) AS LgScansNumQrys FROM dba_tables.DBQLObjTbl_hst o, (SEL procid, queryid, logdate,ZEROIFNULL( TotalCPUTime ) (FLOAT) AS sumcpu,1 (FLOAT) AS NumOfQrys,CASE WHEN totalcputime < 1 OR (TotalCPUTime / (HASHAMP()+1)) =0 THEN 0 ELSE HotAmp1CPU/(TotalCPUTime / (HASHAMP()+1)) END (DEC(8,2)) AS CPUSKW,CASE WHEN totalcputime < 1 OR (TotalIOCount / (HASHAMP()+1)) =0 THEN 0 ELSE HotAmp1IO/(TotalIOCount / (HASHAMP()+1)) END (DEC(8,2)) AS IOSKW,CASE WHEN totalcputime < 1 OR TotalIOCount = 0 THEN 0 ELSE (TotalCPUTime *1000)/TotalIOCount END AS PJI,CASE WHEN totalcputime < 1 OR TotalCPUTime = 0 THEN 0 ELSE TotalIOCount/(TotalCPUTime *1000) END AS UII FROM dba_tables.DBQLogTbl_hst WHERE logdate BETWEEN '2007-10-22' AND '2007-10-28') AS a WHERE a.logdate BETWEEN '2007-10-22' AND '2007-10-28' AND o.objecttype = 'T' AND a.procid = o.procid AND a.queryid = o.queryid AND a.logdate = o.logdate GROUP BY 1,2 order by TotSkdPJLSCPU DESC
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.