Presentation is loading. Please wait.

Presentation is loading. Please wait.

Windows Azure SQL Database (WASD) Troubleshooting Bob Ward Principal Architect Escalation Engineer I will assume basic SQL Server.

Similar presentations


Presentation on theme: "Windows Azure SQL Database (WASD) Troubleshooting Bob Ward Principal Architect Escalation Engineer I will assume basic SQL Server."— Presentation transcript:

1 Windows Azure SQL Database (WASD) Troubleshooting Bob Ward Principal Architect Escalation Engineer I will assume basic SQL Server knowledge I will assume basic SQL Server knowledge

2 2 My Goals for You Today Prepare ReactPrevent

3 3 What Will We Cover Today The Azure Troubleshooting ChallengeTroubleshooting ConnectivityWASD ErrorsQuery PerformancePractical Advice and Tips

4 4 The Azure Troubleshooting Challenge  WASD is a platform service (PAAS)  This is not a VM running SQL Server “box” (IAAS)  Multi-tenant platform  You are sharing a SQL instance with other databases from other customers  You are abstracted from the SQL Server instance, Windows, and computer server  Less admin tasks means lower TCO but also means less access  You are isolated to a specific database  You have a logical server and a master but most things are done in your database  Most things are database scoped (Ex. DMVs)  We make decisions to maximize all database availability  Application design may be required  The service can be updated far quicker than the “box” product

5 5 WASD Connectivity Errors WASD specific errors Firewall blocked in Azure Windows authentication not supported Invalid login – Invalid account or password Denial of Service – After a large number of login failures Network related errors “…Server not found” Connection Timeout Expired Msg 121 “.. The semaphore timeout period has expired” You could lose connectivity Idle connections terminated after 30 minutes (Msg and 10054) We may forcibly disconnect on failover/some errors or change to MAXSIZE Retries you need to take into account Use min 30sec login timeout Use min 30sec login timeout

6 6 Example Connectivity Errors 40XXX errors unique to WASD 40XXX errors unique to WASD Be sure to give this to support Be sure to give this to support May see this after deleting a server May see this after deleting a server Network latency After getting dropped on idle connection After getting dropped on idle connection

7 7 Troubleshooting Connectivity Configuration issues WASD Firewall and your firewall Allow Windows Azure Service Is it our service or your internet? Windows Azure Management Portal Windows Azure Service Dashboard Windows Azure SQL Database Connectivity Troubleshooting Guide General Tools to use ping.exe, telnet.exe, tracert.exe SQL Server 2012 Management Studio – Free with SQL Server 2012 ExpressSQL Server 2012 Express ostress.exe and sqlcmd.exe (username )ostress.exe SQL Database Management Portal – https://.database.windows.netSQL Database Management Portal New System Views (Event Tables) – in master database sys.event_log sys.database_connection_stats History tables – not real time

8 Demo Tools for Connectivity

9 9 WASD Errors FailoverGovernance and QuotaThrottling LimitsEngine Throttling“Not supported”Database copyFederation These can result in connection termination and possible future rejection of work These can result in connection termination and possible future rejection of work Many “box” errors still apply – Ex = deadlock Msg 40XXX range can be seen in sys.messages in SQL Server 2012 full list herehere full list herehere

10 10 Failover Your database, the instance, or the computer is “unhealthy” We may need to patch the instance and/or computer We may decide to “move you” to a replica of your database to another server Msg “..Server not available” What will you see?Implement retry logic in your applicationretry logic The partition is in transition and transactions are being terminated. SHUTDOWN is in progress.

11 11 Governance  Max number of concurrent worker threads (currently 180) per database  Msg if you exceed the limit  Connection terminated. Retry when your concurrent work subsides  Check for blocking problems or inefficient queries  Msg if the overall system has too many workers  You may get less than 180 max  Connection terminated. You can retry but it may take longer to stabilize  Still could be an application issue but a service issue could also be occurring Resource ID : 1 = worker threads

12 12 Quotas  Quota errors for space used  Msg when you run out of space for your max size for your db  Only reads and DELETE/DROP allowed until you free up space  Use sys.dm_db_partition_stats to find what is consuming space  Solutions  Increase max size  Delete data or drop tables/indexes  Partition out database  But…freeing up may not be immediately recognized Changing MAXSIZE disconnects all users Changing MAXSIZE disconnects all users

13 13 Throttling Limits We have a service called a “Watchdog Service” querying the instance for “conditions” to terminate connections to prevent resource problems. We also call these “Watchdogs alerts” We will kill the session with a “reason”. The “reason” is the error message you get Application gets an error message (high severity) and connection terminated (KILL/ROLLBACK status) Sometimes retry works but these usually require some change on your part throttling_long_transaction in sys.event_log We monitor all databases and look for conditions to prevent problems ErrorCondition 40549Session blocking system task for long period of time (20 secs) 40550Session is consuming too many locks (1 million) 40551Session is consuming too much tempdb space (5Gb) 40552Transaction consuming too much log space or active transaction preventing log truncation 40553Session consuming memory (16Mb) and there are memory waits (20secs) Rebuild index Online Rebuild index Online

14 14 Engine Throttling  This is more of a legacy monitoring method used to keep instances healthy  Another external service monitors the health of the instance and computer  Soft throttling – we have detected a resource issue so pick specific databases  Hard throttling – entire instance at risk so all databases are affected  How it Works  Existing requests run to completion  New requests for existing connections and new connections may get Msg and connection terminated depending on type of request  Reason code in Error has more details on soft vs hard, what will be rejected, and why  throttling in sys.event_log Decode reason codes Another resource Decode reason codes Another resource 0x8003 x03 = RejectAll x80 = Hard Throttling on I/O 0x8003 x03 = RejectAll x80 = Hard Throttling on I/O

15 15 “Not Supported” Errors  USE not supported – specify when connecting  ALTER DATABASE supported minimally (Ex. Name, Edition, MAXSIZE, READ_ONLY)  All DBCC commands not supported except for DBCC SHOW_STATISTICS  Database scoped DMVs supported  Feature Support for Windows Azure SQL Database Feature Support for Windows Azure SQL Database  Unsupported Transact-SQL Statements (Windows Azure SQL Database) Unsupported Transact-SQL Statements (Windows Azure SQL Database)  Partially Supported Transact-SQL Statements (Windows Azure SQL Database) Partially Supported Transact-SQL Statements (Windows Azure SQL Database)

16 Demo Using Event Tables to Troubleshoot WASD Errors

17 17 WASD and Query Performance Stick to the basics….. Running or waiting? Blocking or CPU? Is it your application, Windows Azure role, your computer, or queries? Is it network latency? Differences from when “good”? Did the query plan change? Proper indexes – Avoid scans, large sorts, …. Auto create and Auto update stats on by default There are methods to optimize performance specific to Azure Windows Azure SQL Database and SQL Server -- Performance and Scalability Compared and ContrastedWindows Azure SQL Database and SQL Server -- Performance and Scalability Compared and Contrasted Inevitably you may have to shard your data “Chatty” applications don’t usually perform well Avoid large result sets Application problems may show up earlier on this platform (Ex. Transaction keeping the log from being truncated )

18 18 WASD Performance Scenarios Interesting Performance Scenarios On-premise clients may see higher ASYNC_NETWORK_IO waits Small transactions may result in WRITELOG and SE_REPL* waits Deadlocks (Msg 1205) just like the “box” – Use sys.event_log to debug Troubleshooting Query Timeouts Could just be blocking Trace your queries so you know which one timed out Examine query plan and tune the query/indexes

19 19 Dynamic Management Views (DMV) for Performance Find out currently running requests in your database. Use this to detect blocking sys.dm_exec_requests Find out the performance of queries that have run in your database. Look here for worst performing queries sys.dm_exec_query_stats Display the query plan of a specific query sys.dm_exec_query_plan Aggregation history of waits – Some new for WASD Only shows any wait_type with count > 0 sys.dm_db_wait_stats Could indexes help query performance? “missing index DMVs”

20 20 A look at WASD Wait Types

21 Demo Troubleshooting Query Performance on WASD

22 22 Watch Out for These Keep database copies for “user error”Be careful dropping servers and databases in portalDML may fail if no clustered index (temp tables excluded)DMVs are database scopedDatabases have RCSI on by default – tables can be largerDATETIME in all data centers is stored as UTC timeYou may not have access to objects that appear in catalog viewsNon-supported or partial supported commands/featuresSystem Views Unique to WASD

23 23 Before you contact support Check the Azure forums: MSDN or stackoverflowMSDNstackoverflowCheck the service dashboardIs it Windows Azure? On-premise problem?Have exact error message(s) availableHave TracingID availableDo you know the query?Do you have application retry logic?application retry logicGive us the date and time of issue with “observed” timezoneIs this happening now or in the past? We can do RCA but…. It can take some time and we may not have enough history We can do RCA but…. It can take some time and we may not have enough history

24 24 References  Retry Logic for Transient Failures in Windows Azure SQL Database Retry Logic for Transient Failures in Windows Azure SQL Database  Error Messages (Windows Azure SQL Database) Error Messages (Windows Azure SQL Database)  Windows Azure SQL Database Performance and Elasticity Guide Windows Azure SQL Database Performance and Elasticity Guide  Windows Azure SQL Database Connection Management Windows Azure SQL Database Connection Management  sys.event_log documentation sys.event_log documentation  CSS SQL Escalation Blog CSS SQL Escalation Blog  Troubleshoot and Optimize Queries with Windows Azure SQL Database Troubleshoot and Optimize Queries with Windows Azure SQL Database

25 Questions? Thank you!

26 26 The Troubleshooting Checklist Does the Windows Azure Portal work and list your databases? Is there a dashboard posting for an outage in your region? Does the SQL Management Portal work? Does SQL Server Management Studio work? Is there an internet provider issue? Is your firewall configuration correct? Is the problem Windows Azure vs WASD? Is there blocking? Are your queries and index tuned? Is this really an application retry issue?application retry Governance, quotas, limits, and throttling are “part of this platform” Have you looked at Event Tables?


Download ppt "Windows Azure SQL Database (WASD) Troubleshooting Bob Ward Principal Architect Escalation Engineer I will assume basic SQL Server."

Similar presentations


Ads by Google