Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Corruption Be prepared, not scared.

Similar presentations


Presentation on theme: "Database Corruption Be prepared, not scared."— Presentation transcript:

1 Database Corruption Be prepared, not scared.
Richard Banville Fellow, OpenEdge Development Progress Software

2 Dealing With Database Corruption
Preparation Prepare for the worst, hope for the best Prevention Stopping corruption before it happens Avoiding foolish behavior Detection Identifying you have a problem Pinpointing the cause Reaction Resolving corruption with least impact

3 Types Of Corruption Corruption can be small or widespread Physical
User based corruption Internal system based corruption Physical Block level corruption Hardware: Bad disk, memory, etc Logical Missing Data Relational issues Data access Index issues

4 Be Prepared Modern Release (all facets of deployment)
Backups – perform regularly Backup database AND application Perform large backups with split mirrors Run online backup with –Bp TEST your backups with restore & access or hot stand-by prorest – Validates data written successfully (not proper data written) prorest –vf: Compares against original, but who wants to be down that long? Use offsite storage Run with AI enabled Put AI files on separate disk/separate controller AI management tool makes AI management easy prorest <db> -vp prorest <db> -vf We are adding an ai archive utility to help with the maintenance of running with ai

5 Be Really Prepared Keep hot standby
Continually roll forward AI files OpenEdge Replication Have a comprehensive recovery strategy Audit changes Plan for natural disasters Plan for not so natural disasters Document and test your recovery strategy Educate at all levels of organization Implement redundancy Failover Clusters Have a duplicate remote site

6 Database Consistency Checking
Seen these messages before? Index name in customer for recid could not be deleted. Wrong key in index 10 for record Invalid size of an index entry.

7 Database Consistency Checking
Or how about these… Invalid RM block for area 10 rmdoins: pbk->free went negative dbkey 4096 bkwrite: bktbl dbk 4096 not equal to bkbuf dbk bkaddr called with negative blkaddr: -1234

8 Database Consistency Checking
Stop shared memory problems before they happen Memory overwrite protection -MemCheck Ensure block changes written to proper shm location Buffer 1 Buffer 2 Buffer 1 Buffer 2 Insert new key entry Oops! Miscalculation results in memory stomp of next block header. 2 New types of consistency checking Object level for single object – a hierarchy of checking

9 Database Consistency Checking
Stop database corruption from becoming persistent Physical block consistency checking -DbCheck Validates record and index blocks after each update operation -AreaCheck “area name” -Index Check “index name” -TableCheck ‘table name” Typically the result of a bug Available for OLTP and roll forward 2 New types of consistency checking Object level for single object – a hierarchy of checking

10 Enabling Database Consistency Checking
Database startup parameter (-MemCheck, -DbCheck) Managed via promon R&D, 4. Admin Functions 8. Block level consistency check Current consistency check status: 1. -MemCheck: enabled 2. -DbCheck: enabled 3. –AreaCheck: disabled 4. -IndexCheck: disabled 5. -TableCheck: disabled Enter the option to enable/disable a consistency check: Explain a scenario where someone would want to run this during roll forward.

11 Database Consistency Checking Performance Impact
Memory checking: unnoticeable impact Block level checking: still reasonable On error, get .lg file to Progress Technical Support Current consistency check status: 1. -MemCheck: enabled 2. -DbCheck: enabled 3. –AreaCheck: disabled 4. -IndexCheck: disabled 5. -TableCheck: disabled Enter the option to enable/disable a consistency check: < 1% ~5% Explain a scenario where someone would want to run this during rollforward.

12 Identifying Problem Types and Reacting
There are many ways for data to get corrupted Identifying corruption type Key word association can help direct recovery effort Understanding process can also help Quickest way to recovery Knowing the tools & which to use is key Practice recovery efforts before needed Let’s examine a few

13 Index Issues How to proceed Index Messaging Index Root block Key entry
Index <i> in <t> for recid <r> could not be deleted. (1422)  Logical corruption: Missing entries or record not found Index <i>, block <b>, element no. 1: bad compression size. (4423) Physical corruption: Storage format of index is incorrect How to proceed Index Root block Key entry (ix, cx, ky) B-tree Cursor

14 Index Validation Tools
Idxcheck online validation levels Physical/Block corruption Physical consistency Logical/key entry corruption Keys to records Records to keys Validate key order Lock table option New index rebuild may be faster! proutil <db> -C idxcheck

15 Index Validation & Repair Tools
proutil <db> -C idxfix Index Fix Utility 1. Scan records for missing index entries. 2. Scan indexes for invalid index entries. 3. Both 1 and 2 above. 4. Cross-reference check of multiple indexes for a table. 5. Build indexes from existing indexes. 6. Delete one record and it's index entries. 7. Quit. Select one of the following: All (a/A) - Fix all the indexes Some (s/S) - Fix only some of the indexes By Area (r/R) - Fix indexes in selected areas By Schema (c/C) - Fix indexes by schema owners By Table (t/T) - Fix indexes in selected tables By Activation (v/V) - Fix selected active or inactive indexes Fix indexes on Scan. Is this correct? (y/n)

16 Index Validation & Repair Tools
proutil <db> -C idxfix Index Fix Utility 1. Scan records for missing index entries. 2. Scan indexes for invalid index entries. 3. Both 1 and 2 above. 4. Cross-reference check of multiple indexes for a table. 5. Build indexes from existing indexes. 6. Delete one record and it's index entries. 7. Quit. Online operation Transactions are relatively small Does not fix physical block corruption One concurrent idxfix process per table

17 Using Index Fix: Record but no index entry
OLTP (.lg and screen): Index name in customer for recid could not be deleted. 1. Scan records for missing index entries: Index 12 (customer, name): couldn't find key <RICHB> recid Option #1: Add key entry to index 1. Scan records for missing index entries. Fix indexes on Scan. Yes NOTE: 2. Scan indexes for invalid index entries. Would NOT report an error! proutil <db> -C idxfix bbbb Field2 Field3 Field4 richb aaaa 16689 16690 16691 10 11 12

18 Using Index Fix: Record but no index entry
OLTP (.lg and screen): Index name in customer for recid could not be deleted. 1. Scan records for missing index entries: Index 12 (customer, name): couldn't find key <RICHB> recid Option #2: Delete record and its key entry in table’s other indexes 6. Delete one record and it's index entries. Type the recid to delete: 16691 Type the area (number) for the recid(s): 8 Look in the .st file to match area number and area name. proutil <db> -C idxfix 16689 aaaa Field2 Field3 Field4 16690 bbbb Field2 Field3 Field4 Find first cust where recid(cust) = display cust 16691 richb Field2 Field3 Field4 10 11 12

19 Using Index Fix: Record but no index entry
Often no runtime error reported. 2. Scan indexes for invalid index entries: Index 12 (customer, name): found invalid key <RICHB> recid Only option: remove invalid key entry 2. Scan indexes for invalid index entries Fix indexes on Scan. Yes NOTE: 1. Scan records for missing index entries. Would NOT report an error! proutil <db> -C idxfix 10 11 12

20 Fixing Index Corruption (continued)
Missing key entries or record not found (logical corruption) Index fix Action based on record removal or index entry insert/delete Index <i>, block <b>, element no. 1: bad compression size Physical b-tree corruption Must rebuild index to recover

21 Index Repair Tools proutil <db> -C idxbuild Offline utility
Performance improvements since 10.2b06 Will repair: Index block corruption (physical) Orphan index blocks Adds missing index entries Assumes record data is correct Flexible options (db, area, table, index) Truncates existing BI file Does not record idxbuild changes into BI file proutil <db> -C idxbuild

22 Index Rebuild Performance Parameter Suggestions
-TB sort block size: 64 -datascanthreads # threads for data scan phase: 1.5 * # CPUs -TMB merge block size ( default -TB): 64 -TF merge pool fraction of system memory: 80 % -mergethreads # threads per concurrent merge: 1.5 * # CPUs -threadnum # concurrent sort groups merging: 2 to 4 -TM # merge buffers to merge each pass: 32 -rusage report system usage statistics -silent a bit quieter than before

23 Index Build/Repair Tools
Builds and activates index Online One concurrent idxactivate process per table Requires client schema re-cache Transaction size based on “recs” parameter Deactivate requires exclusive access Repair logical and physical index corruption Assumes valid record data *** Static queries require recompile to consider new index proutil <db> -C idxactivate <i1> useindex <i2>

24 Record Issues Record Messaging How to proceed
bffld: nxtfld: scan past last field. (16) Looking for field #5 but only 4 fields exist Record continuation not found, fragment recid <r> area <a>. (10831) Pointer to next record fragment is invalid How to proceed Record recid field (rm, bf, rec) rowid Field1 Field2 Field3 Field4 Record Fragment 1

25 Checking For Inconsistencies Online
proutil <db> -C dbanalys | tabanalys Reads record for statistics purposes Physical Validation 5. Read or Validate Database Block(s) Validation levels 0: Block header info only 1: Record header & record size 2: Record overlap checking Logical Validation w/schema 3. Record Validation 4. Record Version Validation dbtool <db> The –memcheck and -dbchecks are meant to ensure that the database remains consistent at runtime. Dbtool can be run to ennsure that the database doesn’t already contains physical inconsistencies.

26 Record Repair Tools bffld: nxtfld: scan past last field. (16)
Online and multi-threaded 6. Record Fixup Adds missing fields Removes invalid “end-rec” indicator 6. Delete one record and it’s index entries dbtool <db> proutil <db> -C idxfix

27 Record Repair Tools Record continuation not found, fragment recid <r> area <a>. 3. Remove Bad Record Fragment 14. Display Record Contents Exclusive access Truncate bi file Record Fragment 1 proutil <db> -C dbrpr

28 More Record Repair Tools
Record continuation not found, fragment recid <r> area <a> Record Fragment 1 Warning: The use of dbrpr to fix problems in the database should be done with the assistance of Progress Technical Support.

29 Dbrpr Record Fix-up Example – Last resort
Before you do anything: Validate current backup Options: proutil <db> -C truncate bi proutil <db> -C dbrpr 1. Database Scan Menu 10. Display the Free Chain 2. Test One or More Indexes 11. Display the RM Chain 3. Remove Bad Record Fragment 12. Display the Index Delete Chain 4. Dump Block 13. Display Block Contents 5. Load Block 14. Display Record Contents 6. Copy Bytes Between Files 15. Display Cluster Chain 7. Load RM Dump File 16. Scan/Fix block checksum 8. Reformat Block to a Free Block 9. Change Current Working Area

30 Dbrpr Record Fix-up Example – Last resort
Record continuation not found, fragment recid area 8 3. Before you do anything: Validate current backup Validate bad record info 1. Database Scan Menu proutil <db> -C truncate bi proutil <db> -C dbrpr 1. Report Bad Blocks 8. Rebuild RM Chain 3. Fix Bad Blocks 9. Rebuild Index Delete Chain 4. Report Bad Records 10. Change Current Working Area 5. Delete Bad Records 11. Fix Cluster Chains in Type II Area 6. Dump Records to RM File 7. Rebuild Free Chain

31 Dbrpr Record Fix-up Example – Last resort
Record continuation not found, fragment recid area 8 3. Get a view of what you are going to delete: 9. Change Current Working Area 13. Display Block Contents 1. Dump Data Block Details 6. Start Dbkey Delete partial record 3. Remove Bad Record Fragment Re-validate (see previous screen) proutil <db> -C truncate bi proutil <db> -C dbrpr Offset Len Hex Ascii 19 1 0x64 d 21 5 0x richb 30 “” 35 2 0x6d61 MA 3 0x626262 BBB

32 Other Record Oriented Repair Tools
proutil <db> -C dump <table> . -index <i> Binary dump Online & multi-threaded Binary record format May not fix individual record corruption May fail when encountering physical corruption Use selective binary dump to dump in ranges -index defaults to primary index Use different index if primary cannot be used Use –index 0 if no valid index exists (Type II storage area)

33 Other Record Repair Mechanisms
Dump records in “PUB” schema by rowid Manual Ascii dump and load “repair” Reload w/bulk load or ABL import Specify index to use or TABLE-SCAN DEFINE VARIABLE ix AS INTEGER NO-UNDO. FIND _file "item". OUTPUT TO item.d. DO ix = 1 TO 10000: FIND item WHERE RECID(item) = ix NO-ERROR. IF AVAILABLE item AND ix <> INTEGER(_file._template) THEN EXPORT item. END. Make sure Large enough!

34 Block Issues Block and shared memory buffer messages bkio, bk, bm
Wrong dbkey in block. Found <x>, should be <y> in area <z>. (1124) Read, write, modify, release Most often O/S File System issue Reboot often fixes this error – but why? bkioWrite:Unknown O/S error during write, errno 2, fd <x>, len <y>, offset <z>, filename <s> database <t>. (14676) Attempt to read block <n> which does not exist in area <a>. (201) Often index rebuild will fix this error. (rebuild on area level) bkio, bk, bm Block Area Dbkey Buffer Extent

35 Block Repair Tools Checksum validation of dbkey <d> block type 4 in area <a> does not match data. Expected: <e> received <r>. (14410) 1. Report Bad Checksum 2. Fix Bad Checksum 16. Scan/Fix block checksum (Type II Area) Ignore for free blocks (block type 4) Validate database by other means prior to “fixing” True corruption will require a database rebuild dump and load restore/roll forward Master block: 1 Free block: 4 Record block: 3 Index block: 5 proutil <db> -C dbrpr

36 Block Chain Repair Tools
RM chain count inconsistency. 20 Blocks indicated on record free chain (actually 5) RM block found not on RM chain, but flagged RM chain. RM block free chain link error <type> Block <number> with invalid chain type <number> on RM chain Free block marked on free chain but linked into RM chain   RM RM RM FREE

37 Block Chain Repair Tools
RM chain count inconsistency. RM block found not on RM chain, but flagged RM chain. <name> Block <number> with invalid chain type <number> on RM chain  1. Database Scan Menu 7. Rebuild Free Chain 8. Rebuild RM Chain 9. Rebuild Index Delete Chain 11. Fix Cluster Chains in Type II Area Rebuild free chains/rm chains from dbrpr Seek help from support proutil <db> -C dbrpr

38 Recovery Manager Issues
Recovery Messages ** The after-image file expected Tue Feb 26 16:47: (832) ** Those dates don't match, so you have the wrong copy of one of them. (833) Undo failed to reproduce the record in area <a> with rowid <r> and return code  -1. (10566) Invalid block <x> for file <y>.a3, max is 1024 (2329) How to proceed Restore / roll forward Switch to hot standby Recovery (rl) Retry ai, a<n> Redo Before image bi, b<n> Undo After image Transaction (tm) NOTE: tm may be a soft error

39 Recovering From Recovery Failures
I’ve got no backup & crash recovery won’t work? Looks further back in BI. Should no longer be needed but its worth a try! **** As a very last resort, force truncate What are the side effects of skipping crash recovery? -F: How bad could it be? Dump and re-load into new database Reconcile data contents and relationships after load Backup & enable AI Maintain hot standby proutil <db> -C truncate bi –G 120 proutil <db> -C truncate bi -F

40 Structural Repair Those dates don't match, so you have the wrong copy of one of them. Usually the result of an OS copy or move Make sure all right pieces in place & .st file identifies them correctly Does NOT repair corrupt database Updates path names to those specified in .st file Use “sparingly” Patches date mismatch & creates dummy extents Use to recover what ever data remains when no backup exists prostrct repair <db> <x>.st prostrct unlock <db> <x>.st

41 Structural Repair rm x.db - Ooops!
Rebuild database “control area” (.db file) from .st file Changes to control area are not logged Cancelling a txn that changes control area may require builddb May force re-base for OpenEdge Replication Always have an up to date .st file prostrct builddb <db> <db>.st prostrct list <db>

42 Summary The many faces of corruption
Corruption shows itself in many different ways Hard and soft corruption Memory and disk. Record, index, block and db structure Some repair tools are a loaded gun In the wrong hands they can produce havoc Preparation is your best way to recovery Standard disaster recovery preparations Knowing options before problems occur

43 ? Questions

44 www.progress.com/exchange-pug October 6–9, 2013 • Boston #PRGS13
Special low rate of $495 for PUG Challenge attendees with the code PUGAM And visit the Progress booth to learn more about the Progress App Dev Challenge!

45 Change color

46 Recovering From Recovery Failures
Time to restore/roll forward Or switch to hot standby What if roll forward fails? Use roll forward verify Roll forward to point in time or transaction # myDb Hot Standby ai ftp X Roll forward SYSTEM ERROR: Attempt to read block which does not exist in area 8, database x. ** Save file named core for analysis by Progress Software Corporation.

47 AI Validation Before Application
rfutil <db> -C aiverify <type> Partial: AI block and note header validation Increases reliability of archived AI files Full: partial + note data validation Identifies point in time recovery Running At AI switch or on AI archival Just before roll forward of extent Preferably on hot standby “Any” <db> will do Aiverify partial released in 10.1b02. Full released in 10.1c Run if trying to recover as much as possible from a damaged ai file. More than ai scan verbose Finds ai block corruption caused by such things as ftp problems etc.

48 Roll Forward Verification
rfutil <db> -C aiverify <type> myDb Hot Standby ai ftp X rlNoteVerify: Note dbkey is negative (14099) Trid: 358 code = RL_CXINS version = 2 (12528) Hot Stand by: Recovery Scenario: Re-send broken AI file Validate/fix production db Re-base hot standby Roll forward to transaction


Download ppt "Database Corruption Be prepared, not scared."

Similar presentations


Ads by Google