Presentation is loading. Please wait.

Presentation is loading. Please wait.

Backup & Recovery of Physics Databases

Similar presentations


Presentation on theme: "Backup & Recovery of Physics Databases"— Presentation transcript:

1 Backup & Recovery of Physics Databases
Jacek Wojcieszuk IT-DM Technical Meeting December 16th, 2008 The title has to be worked on.

2 Outline Why to backup? Current implementation – Maximum Availability Architecture Main concerns Possible improvements IT/DM technical meeting - 2

3 Why to backup? Backup is one of the main techniques for data protection Properly planned and implemented backup and recovery strategy is critical for business continuity Ideally backups should allow for recovery from any kind of failure without data loss. IT/DM technical meeting - 3

4 Types of failures Oracle instance failure Media failure
Usually due to an Oracle process failure Media failure Disk failure, controller failure, etc. Physical data corruption Human error In most cases accidentally deleted/updated data Database user or DBA Disaster Fire, flood, earthquake, plane crash, overvoltage, etc. IT/DM technical meeting - 4

5 Available tools Oracle offers many tools that help to backup data and address failures: Recovery Manager (RMAN) Data Guard Export/Import Data Pump Streams Oracle supports using OS and hardware features for taking backups snapshots cp command None of mentioned tools alone can protect the data from all types of failures IT/DM technical meeting - 5

6 Oracle Maximum Availability Architecture (MAA)
Oracle's best practices blueprint Goal: to achieve the optimal high availability architecture at the lowest cost and complexity Helps to minimize impact of different types of unplanned and planned downtimes Is based on such Oracle products/features like: RAC ASM RMAN Flashback Data Guard This slide is just to provide some information about MAA and at the same time to give some extra context for the rest of the presentation. On OTN one can also find tables showing impact (expressed in lenght of db unavailability) of different planned and unplanned interventions in case the system implements MAA recommendations. I was thinking about putting an extra slide with this tables. IT/DM technical meeting - 6

7 Data protection at CERN – RAC + ASM + on-disk copies
Clients WAN/Intranet SAN RAC database with ASM TSM RMAN IT/DM technical meeting - 7

8 Data Guard Physical Standby Database
WAN/Intranet RMAN Primary RAC database with ASM Physical Standby RAC database with ASM Data changes Physical standby database added. Animation. Standby database and data flow path will show up after clicking. IT/DM technical meeting - 8

9 Backup implementation - summary
Tape-based incremental backup strategy Be-weekly full backups Daily incremental backups Hourly archivelog backups Disk-based incrementally updated image copies The copy lags 2-3 days behind the database to facilitate handling human errors Oracle Data Guard physical standby databases Configured for 1 day lag IT/DM technical meeting - 9

10 Hours or days in case of restore from tapes
Failure handling Failure Recovery Downtime Oracle instance failure Not needed - RAC keeps the database available Media failure Not needed - ASM keeps data healthy Small physical data corruption RMAN block media recovery using on-disk or on-tape backup Database: 0 Affected application: few hours Wide-range physical data corruption Switchover to the standby database RMAN full database restore using on-disk backup <1 hour with Data Guard <1 hour with on-disk backup Human error RMAN + DataPump using on-disk backup Standby DB + DataPump RMAN + DataPump using on-tape backup Affected application: few hours or even days (if on-tape backup needed) Disaster Switchover to the standby database (if available) RMAN full database restore using on-tape backups Hours or days in case of restore from tapes IT/DM technical meeting - 10

11 Main concerns Data volume Database availability
Very quick, linear growth of databases (especially during LHC run) Some databases may reach 10 TB already next year Database availability On-line databases highly critical for data taking Few hours of downtime can already lead to data loss Off-line databases critical for data distribution and analysis This slide is quite clear, I think. I couldn’t find any concrete numbers concerning maximum allowed unavailability. If you know some you can quote them. IT/DM technical meeting - 11

12 ... More concerns With data volume increase:
Increases probability of physical data corruption Increases frequency of human errors Traditional RMAN and tape-based approach doesn’t fit well into this picture: Leads to backup and recovery times proportional to or dependent on data volume Currently at CERN speed of backup/recovery to/from tapes limited by the speed of 1 Gb Ethernet Standby database located in the same building as primary With the size of the databases the number of hardware pieces used to support them is growing which in turn increases probability of physical data corruption. Also the number of human erros can be higher in case of big databases although the dependence is not so clear in this case. By ‘recovery of a single database object’ I mean Tablespace point in time recovery. IT/DM technical meeting - 12

13 Possible improvements
LAN-free backups to tapes Disk pool instead of tapes Declaring data read-only Archiving old data to limit database growth IT/DM technical meeting - 13

14 LAN-free tape backups Traditionally at CERN tape backups are sent over a general purpose network to a media management server: This limits backup/recovery speed to ~80 MB/s Backup/restore of a 10TB database takes almost 40 hours! At the same time tape drives can archive data with the speed of 160 MB/s compressed Metadata TSM Server 1Gb Database Animation showing the difference between both types of backups. RMAN backups Tape drives IT/DM technical meeting - 14

15 LAN-free tape backups (2)
Tivoli Storage Manager supports so-called LAN-free backup When using LAN-free configuration: Backup data flows to tape drives directly over SAN Media Management Server used only to register backups Very good performance observed during tests (see next slide) Metadata TSM Server 1Gb Database Animation showing the difference between both types of backups. RMAN backups Tape drives IT/DM technical meeting - 15

16 LAN-free tape backups - tests
1 TB test database with contents very similar to one of the production DB ~5% of empty blocks Different TSM configurations: TCP and Shared Memory mode Backups taken using 1 or 2 streams TCP Shared mem 1 stream 198 MB/s 231 MB/s 2 streams 361 MB/s 402 MB/s Restore tests done using 1 stream only Performance of a test with 2 streams affected by Oracle software issues (followed up with Oracle Support) TCP Shared mem 1 stream 150 MB/s 158 MB/s IT/DM technical meeting - 16

17 Using a disk pool instead of tapes
Tape infrastructure is expensive and difficult to maintain: costly hardware and software noticeable maintenance effort tape media is quite unreliable and needs to be validated At the same time disk space is getting cheaper and cheaper: 1.5 TB SATA disks already available Pool of disks can be easily configured as destination for RMAN backups: Can simplify backup infrastructure Can improve backup performance Can increase backup reliability I will extend this slide to provide more details about these improvements IT/DM technical meeting - 17

18 Using a disk pool instead of tapes (2)
Several configurations possible: NFS mounted disks SAN-attached storage with a file system (tested) SAN-attached storage with ASM (tested) RMAN backups I will extend this slide to provide more details about these improvements Storage Area Network Database Remote storage IT/DM technical meeting - 18

19 Using a disk pool instead of tapes - tests
Test perform as part of backup & recovery implementation for LHCb on-line database 1x16-bay disk array used as backup storage 2x8-disk RAID 5 devices Backup storage configured either as an ASM diskgroup or ext3 file system Tests performed with 4 streams ext3 ASM 4 streams 235 MB/s 369 MB/s Tests need to be repeated in the testbed used for test LAN-free backups to tapes IT/DM technical meeting - 19

20 Declaring data read-only
Tablespaces containing static data can be declared read-only Read-only tablespaces does not need to be backed up as often as read-write ones This may significantly reduce amount of resources needed for backups In case of restores one can think of restoring read-only data after read-write part of the database is restored We are improving our backup scripts to handle properly read-only data IT/DM technical meeting - 20

21 Archiving legacy data Data collected by some big applications is accessed for a very limited period of time only Later on it is not needed but has to be kept ‘just in case’ Keeping such data on-line may badly affect database performance and time-to-recover To avoid that legacy data should be archived Archiving of Oracle data is not an easy task and cannot be transparently done by DBAs Proper application design is vital Splitting data using Oracle partitioning or schemas Ensuring self-containment of data from different periods IT/DM technical meeting - 21

22 Data Archiving – possible implementation
Several implementations possible – the simplest presumes creation of an archive DB IT/DM technical meeting - 22

23 Conclusions Production databases are growing very large
Recovery time in case of failure becomes critical Certain types of failures require database restore from a tape backup Time-to-recover proportional to the database size Hours or days in case of big databases LAN-free backups to tapes can significantly shorten backup&recovery time and lead to better resource utilization Replacing tapes with a disk pool can also result in significant backup and recovery time decrease Declaring data read-only and archiving can further help to keep the restore time reasonable IT/DM technical meeting - 23

24 Acknowledgements Many thanks to Dawid, Luca and Lukasz who were helping with the tests and Oracle tuning IT/DM technical meeting - 24

25 Q&A Thank you


Download ppt "Backup & Recovery of Physics Databases"

Similar presentations


Ads by Google