Presentation is loading. Please wait.

Presentation is loading. Please wait.

Meeting the Availability Challenge Don Vilen Program Manager SQL Server Microsoft Corporation.

Similar presentations


Presentation on theme: "Meeting the Availability Challenge Don Vilen Program Manager SQL Server Microsoft Corporation."— Presentation transcript:

1 Meeting the Availability Challenge Don Vilen Program Manager SQL Server Microsoft Corporation

2 Causes of Data Unavailability  What part of your database system causes the most data unavailability?  Hardware?  SQL Server and Windows?  Application software?  Your people?

3 People, Planning, and Procedures  In several studies the major cause of data unavailability is people  Operators and DBAs  Users  SQL Server 2005 has many enhancements to improve availability, but not all problems can be solved by DBMS technology  Training your people, planning your operations, and documenting your procedures can have the greatest impact on availability

4 Availability Enhancements  Database Snapshots  Database Mirroring  Partial Database Availability  Table and Index Partitioning  Row-level Versioning  Snapshot Isolation  Read Committed Snapshot Isolation  Failover Clustering enhancements  Fast Database Recovery  Peer-to-Peer Replication  Scaleable Shared Database  Checksum on Data Pages  Instant File Initialization  Backup/Restore  Checksum on Backups  Backup Media Mirroring  Restore Read-Only Filegroups without Log  Page-level Restore  Piecemeal Restore  Data Backups don’t block Log Backups  Dedicated Admin Connection (DAC)  Dynamic Configuration  CPU Affinity, AWE memory  Attach/Detach

5 Areas of Availability  Operator, DBA, and User Error  High Availability and the SQL Server process  Availability inside the SQL Server process

6 Operator, DBA, User Error People are the cause of most unavailability  Protect – Permissions: Restrict as necessary  Prevent / Avoid – Triggers on DDL and DML can rollback an operation even if they do have permission  Detect – Triggers and Events can be used to detect and record who, what, when  Delay – Log Shipping can delay the error from reaching the standby server  Recover – Database Snapshot can save a copy of a database at a point in time  Can RESTORE from the Snapshot to recover

7 Database Snapshot  Read-only Snapshot of an entire database at a point in time  Created instantly; no data is copied  Snapshot must be created before the error  Base database continues to change  Database Snapshot does not restrict the base database  Multiple Snapshots are allowed  Database Snapshots can exist forever  Constrained by resources

8 Database Snapshot Syntax Examples  To create a database snapshot CREATE DATABASE mydbSnap0600 ON ( ) AS SNAPSHOT OF mydb  To drop a database snapshot DROP DATABASE mydbSnap0600  To restore a database to a snapshot RESTORE DATABASE mydb FROM DATABASE_SNAPSHOT = 'mydbsnap0600'

9 Database Snapshot How it really works mydbSnap – Read-Only Database Snapshot USE mydbSnap SELECT (pages 4, 6, 9, 10, 14) 1 Page 2345678910111213141516 CREATE DATABASE mydbSnap AS SNAPSHOT OF mydb mydb – Database USE mydb UPDATE (pages 4, 9, 10) 4 910 12345678910111213141516

10 Areas of Availability  Operator, DBA, and User Error  High Availability and the SQL Server process  Availability inside the SQL Server process

11 High Availability and the SQL Server Process  Instance Maintenance  Instance and Database Startup  Ensuring High Availability for the SQL Server Process

12 Instance Maintenance Install and Upgrade  More reliable Install / Setup process  Faster Upgrade  Minimal downtime during upgrade  System Stored Procedures are in the ‘resource database’  Read-only, invisible/transparent to users  Allows for fast, side-by-side upgrades  No scripts to run to upgrade system procs  master is upgraded during the Upgrade process  Other databases are upgraded as part of database startup – once master is complete  Upgrade also occurs:  When a down-level database is attached  When a down-level database is restored

13 Instance Maintenance Configuration  More dynamic configuration  No instance restart required to change:  AWE Memory configuration  Hot-add memory  CPU Affinity configuration  Surface Area Configuration Manager

14 Instance and Database Startup  Faster Instance Startup  Memory, even AWE memory, is allocated as needed  Faster Database Startup  Fast Recovery  User databases start up in parallel

15 Fast Recovery on Database Restart  SQL Server 2000  Database is available after Undo completes  SQL Server 2005, Enterprise Edition  Database is available when Undo begins UndoRedoAvailable UndoRedoAvailable Time

16 Ensuring High Availability for the SQL Server Process  Failover Clustering  Database Mirroring  Peer-to-Peer Transactional Replication  Others  Log Shipping  Backup/Restore and Attach/Detach  3 rd Party Solutions  Geographically-Dipersed Clusters

17 Hot Standby Failover Solutions Failover Clustering and Database Mirroring  Both Provide  Automatic detection  Automatic, fast failover  Manual failover  Transparent client redirect  Zero work loss Database Mirroring Failover Cluster

18  Hot Standby – Automatic failover  Built on Microsoft Server Clusters (MSCS)  Multiple nodes provide availability, transparent to client  Maximum of 2, 4, or 8 nodes depending on OS edition  Automatic detection and failover  Requires certified hardware; see Windows Catalog: Clustered  Supports many scenarios: Multiple Active Instances, N+1, N+I Failover Clustering N+1: N Active, 1 Inactive Instances Inst2 * * Inst1 Multiple Active Instances Inst2 * * Inst1 Inst3 * N+I: N Active, I Inactive Instances Failover Cluster * Inst1

19  Zero work loss, zero impact on throughput  Instance Failover – entire instance works as a unit  Single copy of instance databases  Available since SQL Server 7.0  Setup is performed at install time  Clients connect to ‘virtual’ server-name / IP Failover Clustering Hot Standby Instance Failover Cluster * Inst1

20 Failover Clustering SQL Server 2005  More nodes  Match operating system limits  Two-node Failover Clustering is available in SQL Server 2005 Standard Edition  Unattended setup  Support for mounted volumes (Mount Points)  Full support for Majority Node Set quorum  Full-Text Search and Analysis Services are now cluster aware  Database Engine, Agent have been since 7.0  Analysis Services now has multiple instances Failover Cluster

21 Database Mirroring New for SQL Server 2005  Hot Standby Database  Database Failover  Very fast failover  Just seconds in most cases  Zero data loss  Automatic or manual failover  Automatic re-sync after failover  Automatic, transparent client redirect Database Mirroring

22  Hardware  Works with standard computers, storage, and networks  No shared storage components, virtually no distance limitations  Impact to transaction throughput  Zero to minimal, depending on environment / workload / network Database Mirroring

23  Fault Tolerant Virtual Database Principal Clients Witness Mirror

24 Witness and Quorum  Sole purpose of the Witness is to provide automatic failover  To survive the loss of one server you must have at least three  Prevents “split brain”  Does a lost connection mean the partner is down or is the network down?  To become the Principal, a server must talk to at least one other server  The Witness does not direct the Mirror to become Principal  It simply answers the question “Who do you see?” Witness

25 Witness  Witness is an instance of SQL Server 2005  Perhaps even SQL Server Express  Can be witness for multiple sessions  Consumes very little resources  Not a single point of failure  Partners can form quorum on their own Witness

26 Database Mirroring How it works Mirror Principal Witness Log ApplicationSQL Server 2 2 4 5 1 Data Log 3>2>3 Mirror is always redoing – it remains current Commit

27 Safety / Performance  There is a trade-off between performance and safety  Database Mirroring has two safety levels  FULL – commit when logged on Mirror  Allows automatic failover  No data loss  OFF – commit when logged on Principal  System does its best to keep up  Prevents failover; to make mirror available  Must ‘force’ service  Or terminate Database Mirroring session

28 Database Mirroring Modes  High-Availability Mode  Safety Full; Synchronous operation  Database is available whenever a quorum exists  Automatic failover  High-Protection Mode  Safety Full; Synchronous operation  No witness  Principal continues servicing the database even if it loses connection to mirror  Manual failover only; no automatic failover  A transition mode; should not be in this mode for long  High-Performance Mode  Safety Off; Asynchronous operation  Manual failover only  Supports only one form of role switching: forced service (with possible data loss)

29 Transparent Client Redirect  No changes to application code  Client automatically redirected if session is dropped  Client library is aware of Principal and Mirror servers  Upon initial connect to Principal, library caches Mirror name  When client attempts to reconnect  If Principal is available, connects  If not, client library automatically redirects connection to Mirror

30 Peer-to-Peer Transactional Replication  New in SQL Server 2005  Transactional replication but..  All participants are peers  Schema is identical on all sites  Publish the updates made on ‘their’ data  Subscribe to others to pick up their changes  No hierarchy as in ‘normal’ transactional replication  A given set of data can be updated at only one site at a time  Data ‘ownership’ is purely logical; does not prevent conflicts  SQL Server prevents a change from round-tripping  Enables load-balancing and high availability  Warm / hot standby  Small possibility of data loss on failure Peer-to-Peer Transactional Replication

31 Scalable Shared Database  New in SQL Server 2005 EE  Multiple instances sharing the same database files  Volume must be Read-only  Database access is Read-only  Enables load-balancing and scalability  Some availability increase but it depends on shared disk system  Data files can be mirrors periodically split off from an updateable database  Third-party functionality Read- Only Scalable Shared Database

32 Warm Standby Solutions Replication and Log Shipping  Both Provide Multiple copies and Manual failover  Replication – since SQL Server 6.0  Primarily used where availability is required in conjunction with scale out of read activity  Failover possible; a custom solution  Not limited to entire database; Can define subset of source database or tables  Copy of database is continuously accessible for read activity  Latency between source and copy can be as low as seconds  Log Shipping  Basic idea: Backup, Copy, Restore Log will always be supported  But no more investment in the scripts  Database scope  Database accessible but read-only  Users must exit for next log to be applied  Data backups no longer block log backups Log Shipping Replication

33  Both Provide  Manual detection and failover  Potential for some work loss  Whole-database scope  Standard servers  Limited reporting on standby  Duplicate copy of database  Client must know where to re-connect  Slowest failover – Most downtime  Backup / Restore  Smaller size – only used pages are copied  Log backups allow restore to point in time  Longer restore time  Detach / Copy / Attach  Copies entire files  No possibility of rolling forward subsequent logs Backup / Restore Detach / Copy / Attach Cold Standby Solutions Backup / Restore and Detach / Copy / Attach

34 Third-Party Availability Solutions  Many solutions by many companies with many different names  Most replicate data and log changes by one of these methods  Looking into SQL Server’s log files and translating log records into logical operations  Intercepting IOs and duplicating them, either to another disk set or simply recording them  Replicating the writes at the storage level  Local copies / mirrors for snapshot backups, etc.  Remote copies / mirrors for disaster recovery and Geographically-Dispersed Clusters

35 Geographically-Dispersed Clusters and Storage-level Replication  Solutions from many third party vendors replicate the local I/Os on a remote system  Hardware and software methods  Synchronous and asynchronous solutions  Solutions must meet Core I/O Requirements  Many solutions use MSCS to provide automatic failover  See Windows Catalog, Geographically-dispersed Clusters  Other solutions are primarily to duplicate the data at a remote site, often with an independent SQL Server  Similar to Log Shipping or Detach / Attach Geographical Failover Cluster

36 Complementary Technologies Technologies can be Combined  Maximize availability for  Scale out  Offload primary data platform  Heavy reporting  Mobile/disconnected users  Autonomous business units that share data  Maximize availability of critical systems  Designed for failover  Fast, automatic  Zero data loss  Transactionally current  Masks planned and unplanned downtime Replication Failover Solutions

37 Example Fault-Tolerant Publisher and Distributor Subscribers Complementary Technologies Failover Solution + Replication Distributor Publisher

38 Combining HA Technologies  Principal Server can be a Failover Cluster  Failover to mirror will occur before failover within the cluster  So Principal will come back up as the Mirror  Mirror can be a Failover Cluster as well PrincipalMirror Failover Cluster

39 Comparing HA Technologies How Do You Compare the Alternatives? There are many dimensions to consider when evaluating an HA / business continuance solution  Detection: Automatic or Manual  Failover: Automatic or Manual  Time to Fail Over  Number of Failures it can survive  Data Currency / Loss  Cost of redundant system(s)  Additional hardware  Additional management  Granularity of Data Safety:  Instance, Database, Table, Row  Complexity  Data Consistency  Transparency to Clients  Privileges Required to Setup  Remote DR Site  Impact on Performance There is no “one size fits all” Everyone’s requirements are different

40 Areas of Availability  Operator, DBA, and User Error  High Availability and the SQL Server process  Availability inside the SQL Server process

41 Availability Inside the SQL Server Process  Memory and CPUs  I/O and Media/Disks  Concurrency, Avoiding Deadlocks  Diagnosing an Issue  Tuning and Fragmentation  DBCC Operations

42 Memory and CPUs  Memory  Dynamic Configuration applies to AWE memory as well as regular memory  AWE allocation is dynamic  Checkpoint – more well behaved for large memory systems  Sniffer – Checksums periodically evaluated in cache  CPUs  CPU Affinity is Dynamically Configurable  NUMA-awareness  Scheduling on a CPU is at the batch level rather than the session level

43 I/O and Media/Disks  I/O retries, diagnosis/transparency, and better messages  SQL Server’s basic I/O requirements  RAID and storage is usually reliable, but what happens if there are issues with:  Data Files  Log Files  Full-text Catalogs  Backup Files  tempdb

44 Microsoft SQL Server Core I/O Requirements I/O- and Storage-Level Replication  See whitepaper SQL Server 2000 I/O Basics on TechNet  http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIOb asics.mspx http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIOb asics.mspx http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/sqlIOb asics.mspx  To ensure correct operation the underlying I/O and storage system must meet the Core I/O Requirements  Stable Media – once ‘write’ is acknowledged, must be guaranteed  Write Ordering –writes must preserve ordering across devices  Torn I/O Prevention – writes must be done as a unit; not split  Applies to both local storage and any I/O-level or storage-level replication scheme, synchronous or asynchronous  Guarantees that what appears at secondary site is an image that actually appeared at some point in time at the primary  Asynchronous replication allows potential loss of committed transactions

45 Data Files  Database Page Checksums  Detect disk I/O errors not reported by the hardware or operating system  DBCC Check* now use Database Snapshots for improved scalability  Partial Database Availability  Database is available if the Primary filegroup is available  Fine-Grained Online Repair  Online Restore – Database remains online; Only data being restored is offline  Piecemeal Restore – Online restore of filegroups by priority  Filegroup / File backups allowed on Simple recovery model databases  Page-level Restore – Can restore individual pages to repair errors found by page checksum or torn pages  Instant File Initialization  Skips file zeroing, fast DB create / restore  Restore read-only filegroups without applying logs

46 Log Files and Full-text Catalogs  Log Files  Checksum  Full-text Catalogs  Backup / Restore includes Full-text data  Detach / Attach includes Full-text data

47 Backup Files  RESTORE VERIFYONLY now checks everything it can short of writing the data  Backup Media Mirroring  Extra copies for archival or disaster recovery  Backup and Database Page Checksums  RESTORE can detect disk I/O errors not reported by the hardware or operating system  Can continue past errors – repair later

48 tempdb Scalability  Scalability of tempdb has been enhanced  Caching of initial pages for temporary tables and table variables  Improved allocation page latching protocol  Reduced logging overhead for tempdb so there is less I/O bandwidth for tempdb log file  More efficient allocation algorithm for pages from ‘mixed’ extents in tempdb

49 tempdb Best Practices  We still recommend the following if you see latch contention on tempdb allocation or system catalog pages:  Avoid auto grow  Pre-allocate space for tempdb files  Make as many tempdb files as you have CPUs  Account for any affinity mask settings  File sizes of equal amounts

50 Increased Concurrency Avoiding Blocking and Deadlocks  Database Snapshots  Standby / Reporting server with  Replication  Log Shipping  Database Mirroring  Scalable Shared Databases – Read-only  Row-level versioning and new Snapshot isolation levels

51 Increased Concurrency Row-level Versioning  Row-level versioning keeps a copy of each row prior to an update  Rather than wait for an update to complete, you can see the data that was committed at the time:  Your transaction began: Snapshot Isolation  Your statement began: Read-Committed Snapshot Isolation  You only see committed data, and it is time-consistent and transactionally consistent  But it might not be the most recent, so not appropriate for some OLTP applications  If you can work with this snapshot data  Readers don’t wait for locks, reducing blocking and deadlocks  You have a view of the data at a single point in time  Triggers now use row-level versions rather than the log

52 Diagnosing an Issue  Diagnosis, Runaway query  DMVs  Events  DDL Triggers  Deadlock graphs in XML  Dedicated Administration Connection – DAC

53 Tuning and Fragmentation  Database Tuning Advisor  Understands indexes and partitions  Online Index Operations  Indexes remain online during reorganize and rebuild  Table and Index Partitioning  Rows can be placed in filegroups depending on a column value  Partial availability, defragmentation can occur at the partition/filegroup level  Partitions can be moved among tables for fast load, data movement

54 DBCC Operations  Shrink – includes BLOBs  Defrag – new ALTER INDEX REORGANIZE | REBUILD  Check* uses Database Snapshots for improved scalability, performance  In all editions!  Progress Reporting for Shrink and Defrag

55 Areas of Availability Summary  Operator, DBA, and User Error  High Availability and the SQL Server process  Availability inside the SQL Server process

56 For More Information  SQL Server Books Online  Whitepapers  Failover Clustering  Database Mirroring  Peer-to-Peer Replication  MSDN and TechNet webcasts

57 © 2006 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only. Microsoft makes no warranties, express or implied, in this summary.


Download ppt "Meeting the Availability Challenge Don Vilen Program Manager SQL Server Microsoft Corporation."

Similar presentations


Ads by Google