Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exchange Server 2010 & 2013: Disaster Recovery – Troubleshooter v.1.0.

Similar presentations


Presentation on theme: "Exchange Server 2010 & 2013: Disaster Recovery – Troubleshooter v.1.0."— Presentation transcript:

1 Exchange Server 2010 & 2013: Disaster Recovery – Troubleshooter v.1.0

2 How to View This Presentation
Switch to Reading view: Click View on the ribbon and select Reading View Use page up or page down to navigate Zoom in or out as needed In the Reading View view you can: Interact with buttons and presentation logic

3 About this troubleshooter
This troubleshooter is proudly inspired by: Exchange 2010 Datacenter Switchover Troubleshooter * Check it out, to get a great coverage regarding Datacenter switchover operations.

4 Instructions How to use this tool? 1. Switch to “Slide Show”
2. Select the kind of issue occurring This presentation was designed to work as quick Consulting guide, during crisis (Disaster) situations on Exchange Server 2010/2013 Environments. It will work only, through slide show view (or Reading View). It is intended to cover the following common scenarios: Mailbox issues Database issues Server issues (We will not cover “Datacenter” issues (or “Datacenter Switchover”), as there a great public document covering it: > In fact, the document from the link was the inspiration for the construction of this material) The 1st slide will present the 3 main kinds of issues, and from there, you will be presented to troubleshooting suggestions. How to use this tool? 3. Follow the instructions for each scenario

5 Scope of issue: Mailbox Database Exchange Server Exit

6 Mailbox level issues Symptoms and common causes:
Item count issues or items’ size vs mailbox size; Items “disappearing” (e.g.: meetings, contacts, s); Items are being duplicated; OOF (Out-Of-Office) showing unusual behavior and/or errors; Outlook AND OWA showing errors, during mailbox access or folder navigation; Corruption of “old” items (e.g.: s previously read that cannot be opened); Pre-defined searches not working anymore. Display name corruption for items/folders. Symptoms match? YES NO

7 Mailbox level issues Troubleshooting:
Using Exchange Management Shell (EMS): New-MailboxRepairRequest -Mailbox <Affected_Mailbox> -CorruptionType <SearchFolder,AggregateCounts,FolderView,ProvisionedFolder> Expected results: At Mailbox Server holding the mailbox, the “Application Event Viewer” should show: EventID: – Source: MSExchangeIS Mailbox Store – Starting the repair process. EventID: – Source: MSExchangeIS Mailbox Store – Informing the end of repair process. EventID: – Source: MSExchangeIS Mailbox Store – Informing the end of repair process and repaired objects report. Symptoms persists? New-MailboxRepairRequest (2010) New-MailboxRepairRequest (2013) YES NO

8 Mailbox level issues Troubleshooting:
Get mailbox statistics, through EMS: Get-MailboxFolderStatistics <Affected_Mailbox> |fl Identity,ItemsInFolder,FolderSize -AutoSize Through EMS, try to move mailbox, forcing to remove logical corruption: New-MoveRequest -Identity <Affected_Mailbox> -BadItemLimit <0 a 50> -TargetDatabase <different_database> Expected results: Check “MoveRequest Report” (Go to Exchange Magement Console (EMC) > Recipient Configuration > Move Request; or, via Powershell: Get-MoveRequest). Warning: If logical corruption happened, it is possible to lose affected data. Note: It is possible to recover “MoveHistory”, even after move report was removed: $MoveReport = (Get-MailboxStatistics -Identity `mailbox' -IncludeMoveReport).MoveHistory $MoveReport > path\history_file_name.txt Symptoms persists? New-MailboxRepairRequest(2010) New-MailboxRepairRequest(2013) ============================================== Get-MailboxFolderStatistics (2010) Get-MailboxFolderStatistics (2013) Get-MoveRequest (2010) Get-MoveRequest (2013) =============================================== Get-MailboxStatistics (2010) Get-MailboxStatistics (2013) YES NO

9 Database level issues Symptoms and common causes:
The same type of issues already listed for mailbox level, though affecting several (or all) mailboxes within a database (also called DB); Database won’t mount, after Information Store crash (possible logical corruption); Database dismounted, and won’t mount; Database states “dirty shutdown” during *.edb check via ESEUTIL / MH; Database states “dirty shutdown” during *.log check via ESEUTIL /ML; Database states “dirty shutdown” and logs are “disappearing” (check Antivirus). Database states “clean shutdown” AND “log required” via ESEUTIL /MH (and vice- versa). Symptoms match? YES NO

10 Affected database is protected by DAG (Database Availability Group)?
Database level issues Affected database is protected by DAG (Database Availability Group)? YES NO

11 Possible action plans:
Database level issues Database protected by DAG: It is possible that the Database copy is already mounted in another DAG member, as long as the copy was in “Health” state. However, DAG could suffer a failure that avoids databases to mount, forcing administrators to rebuild DAG copy through restore, or in the worst cases, force the copy still running on a healthy server, to mount, affording to lose data. There are, literally, dozens of factors that can cause this kind of scenario, therefore our approach is to discuss the most common scenarios, and how to fix each one. Possible action plans: Rebuild copy Rebuild DB Index Force mounting Failback Dial-Tone

12 Database level issues YES NO Dial-Tone Database:
Check database and log paths, through EMS: Get-Mailbox <Affected_DB> |fl *path* Check “EDBFilePath” and “LogFolderPath” and be sure there is no remaining files on those locations (Better move files to secure location, instead of delete this set of files). Force database to mount (via EMC or EMS): Mount-Database <Affected_DB>. Accept the creation of new log and EDB files. When the original DB is recovered, change the EDB’s, by dismounting the current (dial-tone) and moving it to a safe location, then replace it with the recovered EDB (or, simply overwrite it, using the back-up tool, after the dial-tone DB has been already copied to a safe location). Merging data of dial-tone and production EDB’s: New-MailboxDatabase -Name “Recovery_DB” (could be another meaningful name) -Server <Recover_Server> (on the same server) - EDBFilePath <“path+name.edb”> -LogFolderPath <“path_logs”> -Recovery (it will configure this new DB in “recovery mode”). Mount-Database <Recovery_DB> (it will mount the DB configured on prior step). Configure the production DB to allow restore: Check “This Database can be overwritten by a restore”, at “Maintenance” tab of production database, through EMC or using EMS. Via EMS, execute: Get-MailboxStatistics -Database Recovery_DB | Restor box -RecoveryDatabase Recovery_DB After this cmdlet, check if Outlook is not showing “Maintenance warnings”, and if it is already presenting all the data (recovered from backup, but also, dial-tone data). There is not a warning message at OWA, so it is best to test it through Outlook to check whether operation succeed. The approach of keeping dial-tone mounted as a production database, and merging data from database recovered (e.g.: by restore from backup) will cause permanent Outlook pop-ups about “Maintenance mode”. If this approach is adopted, the only way to fix it is by recreating the Outlook Profile in each machine, displaying the message. Symptoms persists? Dial-Tone Database is a method specially useful to the Exchange DR strategy, called “Service 1st, data after” (or any variation purposing the same way). Suppose the restore operation are going to take 12 hours, or ESEUTIL /P is the only way, and DB is about 1 TeraByte. Instead of let users without systems for hours, we can restore mail services within minutes, and with more time to work over data recovery, work on restore processes. When Exchange Information Store service can’t find an EDB file, neither the database logs, it will ask administrator whether it is supposed to create a new set of these files. If Administrator allows, Information stores creates those set, and users are able to send and receive messages again despite OWA users and Outlook Online mode (no cache enabled) won’t have any history of messages previously received, until the restore operation and merge operation are completed. Exchange 2010 Prerequisites Exchange 2013 Prerequisites ============================================ Exchange 2010 SP1 Windows 2008 R2 & SP2 Pre-requisites made easy Recover an Exchange Server Perform a Dial Tone Recovery (2010) Perform a Dial Tone Recovery (2013) YES NO

13 Possible action plans:
Database level issues Standalone Database (no DAG): This type of database is not ready to take failover actions. There is, at least, three ways to recover a standalone database that fails, including log sequence verification, through the need to restore from back-up. We are going to discuss the most common procedures to recovery standalone databases. Possible action plans: Troubleshoot Database Mount Problems  Check logs and EDB Replay Logs ESEutil /P

14 Database level issues Check logs and EDB:
Check Windows Event Viewer, in “Application” section for “ESE” source events. Check disk space on the paths used for logs & EDB. If there are no abnormalities during the routines above, it is time to check EDB: Elevate CMD, the access the file path to EDB and execute ESEUTIL /MH against the file: Note down the values for the fields “State” and “Log Required”; “State” can display “Clean” or “Dirty Shutdown”. “Log Required” can display any value from “0-0” (no log required), to a series of required logs. Any Database which is “State” is equal to “Clean Shutdown”, is technically ready to be mounted, even if all logs are lost. However, some serious kinds of physical corruption can render a DB in “Clean State”, that cannot be mounted, with several errors. During a “ unexpected dismount” occurrence, 1st step is to check Application Event Viewer, looking for events from source “ESE”. If there is no reason for the “dismount” occurrence, 2nd logical thing to check is disk space, although ESE events should have been logged about this. Then you should check EDB status using ESEUTIL utility. After all this approach has been done, you can define a strategy about how to bring database up and running again. ============================================== Eseutil.exe Examples * Although link is for Exchange 2013, examples are valid for any Exchange version. Next

15 Database level issues YES NO Check logs & EDB:
Load elevated CMD, access path folder, and execute the ESEutil /ML at generation sequence: Example: e:\Db1\Logs\> ESEutil /ML E00 (“E00” the standard for new DBs, although this value can change). A list of log sequence and the state of each log is displayed. The States could be “OK”, “Missing”, or “Error:” (example): E log – OK E log – OK E log – OK E log – OK E log – OK E log – OK E log – OK E log – OK E log – OK E A.log – OK E B.log – OK (...) Symptoms match? Eseutil.exe Examples YES NO

16 Database level issues Check logs & EDB:
If “State” presents “Dirty Shutdown”, and “Log Required” points to any other value than “0-0” (expected), it will be necessary to find out the logs missing. Example: DB1 State “Dirty Shutdown” – Log Required “0x1 – 0x2” To identify the corresponding log generation file, open an elevated CMD, and execute: ESEutil /ML e04.log (example). There is a field called “LGeneration” that provides the formation sequence of this particular log, corresponding to “Log Required” field, presented at database command. If every .log file required at “Log Required” field is present and healthy, we can follow the “Replay” process. Symptoms persists? Eseutil.exe Examples YES NO

17 Database level issues Replay Logs process:
If log sequence and EDB was successfully validate, it is time to log replay: Through elevated CMD, access path for logs. Execute “ESEutil /R E04” (as we discussed before, this value can be different. Check the prefix name, used at every log file for a “tip” or use ESEUTIL /MH to find out). This command identifies the path to EDB and apply the logs required by DB, just after checking again for log integrity and sequence. At the end, if no errors were detected, the EDB will display “State” = “Clean Shutdown”, upon ESEUTIL /MH execution. After this, we are ready to mount the database, dismissing any specific parameter. Symptoms persists? Eseutil.exe Examples YES NO

18 Database level issues ESEutil /P:
ALWAYS, the last resort (recommended after attempts to fix with Microsoft Support representatives have failed). Implies loss of data. Open an elevated CMD and access the path to EDB. Always do a secure copy of the EDB, prior /P execution. Execute: ESEutil /P db1.edb (example). After this process, we are going to get an EDB in “Clean Shutdown” state. Yet, it is not logically consistent. As “ISInteg” tool is now deprecated, we have to use EMS cmdlets for fix this: New-MailboxRepairRequest -Database <path_for_DB_after_ESEutli/p> -CorruptionType <SearchFolder,AggregateCounts,FolderView,ProvisionedFolder> Symptoms persists? Eseutil.exe Examples YES NO

19 Database level issues Recreate copy:
At the server where the DB first crashed (and now is acting as the passive copy): Suspend-MailboxDatabaseCopy -Identity <DB_Name\Server_Name>  Executing “Full reseed”: Updat boxDatabaseCopy -Identity <DB_Name\Healthy_Copy_Server_Name> -DeleteExistingFiles This process can spend a long time, varying due to database size. Symptoms persists? Failed copies on DAG member will, commonly, force a failover. However, the failed copy can suffer some type of corruption that rendes this copy is no longer healthy, and will demand administrator’s intervention. The statement above is specially true if you waited a reasonable time, to check whether Exchange was able to bring it back to “Healthy”, by itself (Actually, there are processes going on, trying to fix the issues avoiding this copy to be healthy again, so it is advisable to give sometime (like half -hour) before start human intervention for copies in “failed” state). Suspend-MailboxDatabaseCopy (2010) Suspend-MailboxDatabaseCopy (2013) =============================================== Update a Mailbox Database Copy (2010) Update a Mailbox Database Copy (2013) YES NO

20 Database level issues Recreating Content Index for a DAG database:
At the server presenting the issue for Content Index: Suspend-MailboxDatabaseCopy -Identity <DB_Name\Server_Name>  Regenerating Content Index: Updat boxDatabaseCopy -Identity <DB_Name\Server_Name> -CatologOnly This process can take a long time, varying due to database size. Symptoms persists? Failed copies on DAG member will, commonly, force a failover. However, the failed copy can suffer some type of corruption that rendes this copy is no longer healthy, and will demand administrator’s intervention. The statement above is specially true if you waited a reasonable time, to check whether Exchange was able to bring it back to “Healthy”, by itself (Actually, there are processes going on, trying to fix the issues avoiding this copy to be healthy again, so it is advisable to give sometime (like half -hour) before start human intervention for copies in “failed” state). Suspend-MailboxDatabaseCopy (2010) Suspend-MailboxDatabaseCopy (2013) =============================================== Update a Mailbox Database Copy (2010) Update a Mailbox Database Copy (2013) YES NO

21 Database level issues Forced mounting:
It is possible, though uncommon, to suffer loss of data. On affected server, where forced mounting will be attempted: Move-Activ boxDatabase -Identity <DB_Name> -ActivateOnServer <Sever_Name> -MountDialOverride "BestEffort" -SkipActiveCopyChecks -SkipLagChecks -SkipClientExperienceChecks - SkipHealthChecks Discharging, basically, “all” routines used to check DAG database integrity and health, this cmdlet will attempt to mount the db, accepting to lose data. Several mechanisms are in place to avoid this risk to occur, but it is impossible to ensure “no risk” through this method. Symptoms persists? Failed copies on DAG member will, commonly, force a failover. However, the failed copy can suffer some type of corruption that rendes this copy is no longer healthy, and will demand administrator’s intervention. The statement above is specially true if you waited a reasonable time, to check whether Exchange was able to bring it back to “Healthy”, by itself (Actually, there are processes going on, trying to fix the issues avoiding this copy to be healthy again, so it is advisable to give sometime (like half -hour) before start human intervention for copies in “failed” state). =============================================== Activate a Mailbox Database Copy (2010) Activate a Mailbox Database Copy (2013) Move-Activ boxDatabase (2010) Move-Activ boxDatabase (2013) YES NO

22 Database level issues Failback: Get-MailboxDatabaseCopyStatus DB_Name
Prior to execute failback, check columns “Status” & “ContentIndex State”, during the cmdlet above. Show present “Healthy” for both. Otherwise, failback will fail. If any other status is present, try “DB Copy Rebuild” and/or “DB Catalog Rebuild” operations. Then, the failback occurs using “Move-Activ boxDatabase”: Move-Activ boxDatabase -Identity <“DB_Name”> -ActivateOnServer <“Server_Name”> Symptoms persists? Get-MailboxDatabaseCopyStatus (2010) Get-MailboxDatabaseCopyStatus (2013) =============================================== Move the Active Mailbox Database (2010) Move the Active Mailbox Database (2013) YES NO

23 Exchange Server level issues
Symptoms and common causes: Common causes and symptoms related at “Database level”, however, affecting all databases present in a given server; Exchange server services won’t start, logging errors at Event Viewer; Windows Server is corrupted and O.S. is lost; Damaged hardware, beyond repair. Symptoms match? YES NO

24 Exchange Server level issues
Exchange Role presenting issues: Mailbox Server Client Access Server / Hub Transport Server Dial-Tone Database* Return

25 Exchange Server level issues
Mailbox Server: Reset computer account for the affected server, through ADUC (Active Directory Users and Computers), or any other supported method. Reinstall Operation System exactly as the server was configured with, prior the crash, and provide the same FQDN (full qualified domain name)of the lost server. It is not possible to recover a server, using another server name or O.S. version. Reconfigure all Network adapter to the values of the lost server. Do not join the domain. Mailbox Server type Reset a Computer Account DAG Standalone

26 Exchange Server level issues
Mailbox Server (DAG): Install O.S and Exchange Server pre-requisites, hotfixes, and so on. Tip: Using an elevated CMD or EMS, access the Exchange Installation folder and execute: servermanagercmd -ip exchange- typical.xml (this script installs Exchange pre-requisites (only) for all the roles. There are other scripts on this folder). Using Exchange Management Shell of other server: Remov boxDatabaseCopy DB_Name\Server_lost_Name Remove-DabaseAvailabilityGroupServer -Identity <DAG_Name> -MailboxServer <Server_Lost_Name> -ConfigurationOnly cluster.exe /cluster:<DAG Name> Node <Server Name> /Evict (Force removal of lost server from cluster database). Add the new server (but with the same old name) to Active Directory domain, again. At the new server, open elevated command prompt. At Exchange 2010 Installation folder path, execute: Setup /m:RecoverServer If there are healthy database copies of this server, at the other DAG members: Add-DatabaseAvailabilityGroupServer -Identity DAG_Name –MailboxServer <Server_Recovered_Name> Add-MailboxDatabaseCopy -Identity <DB_Name> -MailboxServer <Server_Recovered_Name> If something fails, during this process, it is possible to solve issues by using “reseed” process, at “Database level issues”. If there are no remain copies for this server at DAG members, repeat step “a.” above and, then, recover DB's from backup. Symptoms persists? Exchange 2010 Prerequisites Exchange 2013 Prerequisites ============================================ Exchange 2010 SP1 Windows 2008 R2 & SP2 Pre-requisites made easy Recover an Exchange Server YES NO

27 Exchange Server level issues
Mailbox Server (Standalone): Install O.S and Exchange Server pre-requisites, hotfixes, and so on. Tip: Using an elevated CMD or EMS, access the Exchange Installation folder and execute: servermanagercmd -ip exchange-typical.xml (this script installs Exchange pre-requisites (only) for all the roles. There are other scripts on this folder). Add the server to Active Directory domain, again. Same FQDN (full qualified domain name) and IP configurations. Access the elevated cmd prompt at the recovered server. At installation folder path for Exchange 2010, execute: Setup /m:RecoverServer As we are considering a Standalone Mailbox Server, there are no database copies on other servers, so restore from backup is the only way to recover database data. Symptoms persists? Exchange 2010 Prerequisites Exchange 2013 Prerequisites ============================================ Exchange 2010 SP1 Windows 2008 R2 & SP2 Pre-requisites made easy Recover an Exchange Server YES NO

28 Exchange Servers level issues
Client Access Server/Hub Transport Server: Install O.S and Exchange Server pre-requisites, hotfixes, and so on. Tip: Using an elevated CMD or EMS, access the Exchange Installation folder and execute: servermanagercmd -ip exchange-typical.xml (this script installs Exchange pre-requisites (only) for all the roles. There are other scripts on this folder). Reset computer account at AD (example: via AD Users and Computers) for the affected server. Add the server as a domain joined Active Directory computer, again. Access the server to be recovered, and executed elevated CMD prompt. At installation folder path for Exchange 2010, execute: Setup /m:RecoverServer Reconfigure NLB, CAS Array, customizations for OWA, Certificates (SSL), and etc., as needed. Symptoms persists? Exchange 2010 Prerequisites Exchange 2013 Prerequisites ============================================ Exchange 2010 SP1 Windows 2008 R2 & SP2 Pre-requisites made easy Recover an Exchange Server YES NO

29 Time to restore back-up OR contact Microsoft Support
If you reached this page... The issue your Exchange is facing is not “regular”, or it is not enough to use the knowledge presented on this document to deal with it. Next steps: If servers are ok, and you just need the data, then use a restore from your backup; Or call the Microsoft Support Team (PSS), to get help from a representative, specialized in your affected product. See the link below, for contact information: Using Microsoft Product Support Services Return

30


Download ppt "Exchange Server 2010 & 2013: Disaster Recovery – Troubleshooter v.1.0."

Similar presentations


Ads by Google