Presentation is loading. Please wait.

Presentation is loading. Please wait.

DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software

Similar presentations


Presentation on theme: "DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software"— Presentation transcript:

1 DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software

2 DB-20: Highly Parallel Dump and Load - 2 Agenda… n Why Dump and Load? n What Methods are Available? n Using a Highly Parallel Approach n Demo!

3 DB-20: Highly Parallel Dump and Load - 3 Why Dump and Load? n Because it is X months since the last time. n Because the vendor said so. n Because we must in order to upgrade Progress. n Because it will improve performance.

4 DB-20: Highly Parallel Dump and Load - 4 Why Dump and Load? n Because it is X months since the last time. n Because the vendor said so. n Because we must in order to upgrade Progress. n Because it will improve performance. X

5 DB-20: Highly Parallel Dump and Load - 5 Good Reasons to D&L n Improve sequential data access (reports). n Consolidate fragmented records. n To recover corrupted data. n In order to move to a new platform. n In order to reconfigure a storage management decision. n In order to take advantage of hot new db features!

6 DB-20: Highly Parallel Dump and Load - 6 Hot Features That Need D&L n Variable Block Sizes n Storage Areas n Variable Rows Per Block n Data Clusters (Type II Areas)

7 DB-20: Highly Parallel Dump and Load - 7 Some Dump & Load Choices n “Classic” Dictionary Dump & Load n Bulk Loader n Binary Dump & Load n Vendor Supplied Utilities n Parallelization –Many Benefits –Very difficult to effectively balance manually. n Automating the process

8 DB-20: Highly Parallel Dump and Load - 8 Classic Dictionary D&L Order-lineCustorderhist custOrder-lineorder 1 st the dump… … and then the load.

9 DB-20: Highly Parallel Dump and Load - 9 Classic Dictionary D&L This can take DAYS (or even WEEKS!)

10 DB-20: Highly Parallel Dump and Load - 10 Multi-threaded D&L Order-line cust orderhist cust Order-lineorder 1 st the dump… … and then the load.

11 DB-20: Highly Parallel Dump and Load - 11 Multi-threaded D&L n Much more reasonable than “classic.” n Generally can be done in a weekend. n But still too long for many situations: –Threads aren’t easily balanced. –Longest thread determines total time. –Timing varies – it is hard to predict.

12 DB-20: Highly Parallel Dump and Load - 12 Overlapping, Parallel, Balanced D&L Order-line cust orderhist cust Order-line order Both the dump… … and the load – in parallel, multi-threaded and load-balanced.

13 DB-20: Highly Parallel Dump and Load - 13 Parallel Balanced D&L n Usually a big improvement. n Often can be done in hours. n But difficult to configure and manage: –You need an intelligent and dynamic control program.

14 DB-20: Highly Parallel Dump and Load - 14 Highly Parallel D&L

15 DB-20: Highly Parallel Dump and Load - 15 Demo !!!

16 DB-20: Highly Parallel Dump and Load - 16 Resources n SUSE Linux 9 Server –2 x 2ghz (Xeon w/HT) –512 MB RAM –3 internal disks n Various External Disks n 5GB Demo Database –85 Tables –250 Indexes –Various table and record sizes

17 DB-20: Highly Parallel Dump and Load - 17 The Storage Area Design n Separate Tables from Indexes. n Use a single Type 1 Area for tiny “miscellaneous” tables that have low activity. n Isolate Small, but very active, tables in Type 2 Areas (or standalone areas if v9). n Organize Large Tables by RPB in Type 2 Areas –Any table over 1 million rows –Any table over 1GB n Isolate very large indexes

18 DB-20: Highly Parallel Dump and Load - 18 The Storage Area Design b exch06.b1 f # d "Schema Area":6,64. # d "Misc":10,256 exch06_10.d1 f 768 # d "SmallTables":12,128;8 exch06_12.d1 f 9120 # d "RPB256":14,256;512 exch06_14.d1 f # d "RPB128":16,128;512 exch06_16.d1 f # d "RPB64":18,64;512 exch06_18.d1 f d "RPB64":18,64;512 exch06_18.d2 f # d "RPB32":20,32;512 exch06_20.d1 f d "RPB32":20,32;512 exch06_20.d2 f # d "RPB16":22,16;512 exch06_22.d1 f # d "Indexes":99,8 exch06_99.d1 f

19 DB-20: Highly Parallel Dump and Load - 19 Highly Parallel Dump & Load n 90 threads, 16 concurrent. n Dump threads are the same as load threads. n Easy monitoring and control. n No latency – the load finishes when the dump finishes. n No intermediate disk IO: –More efficient usage of disk resources. –Higher overall throughput. n No index rebuild. n No need for table analysis.

20 DB-20: Highly Parallel Dump and Load - 20 The 80/20 Rule n 98% of the tables are done very quickly n 2% of the tables take most of the time… n … and nearly all of the space n These are the tables that we target with multiple threads!

21 DB-20: Highly Parallel Dump and Load - 21 Multiple Threads Table t Criteria ──────── ─ ────────────────────────────────────────────────────────────────── document 1 where src.document.app# < document 2 where src.document.app# >= and src.document.app# < document 3 where src.document.app# >= and src.document.app# < document 4 where src.document.app# >= and src.document.app# < document 5 where src.document.app# >= and src.document.app# < document 6 where src.document.app# >= and src.document.app# < document 7 where src.document.app# >= LOGS 1 where type = "P" and ( logdate <= 12/31/2003 ) LOGS 6 where type = "P" and ( logdate >= 01/01/2004 and logdate <= 06/30/2004 ) LOGS 7 where type = "P" and ( logdate >= 07/01/2004 and logdate <= 12/31/2004 ) LOGS 8 where type = "P" and ( logdate >= 01/01/2005 and logdate <= 06/30/2005 ) LOGS 9 where type = "P" and ( logdate >= 07/01/2005 and logdate <= 12/31/2005 ) LOGS 10 where type = "P" and ( logdate >= 01/01/2006 ) LOGS 14 where type = "L" and ( logdate <= 08/01/2002 ) LOGS 15 where type = "L" and ( logdate > 08/01/2002 ) LOGS 16 where type = "M" LOGS 17 where type = "U"

22 DB-20: Highly Parallel Dump and Load - 22 Dump Factors n Major factors that contribute to dump time: –Data distribution (scatter factor) –Index Selection –-B, -Bp, -RO –Storage Area Configuration (dedicated areas and type 2 areas can be very, very fast) –Disk Configuration and Throughput n proutil dbname –C tabanalys (when doing a binary d&l)

23 DB-20: Highly Parallel Dump and Load - 23 Concurrency -- Dumping

24 DB-20: Highly Parallel Dump and Load - 24 Take a Load Off n Factors that contribute to load time: –Progress version –Build indexes –-i, -bibufs, -bigrow, -B, -TB, -TM, -SG –Storage Area Configuration –Disk Configuration and Throughput –-T files –-noautoreslist (aka “forward-only”) n The importance of testing

25 DB-20: Highly Parallel Dump and Load - 25 Concurrency -- Loading

26 DB-20: Highly Parallel Dump and Load - 26 Buffer-Copy & Raw-Transfer n Very Fast n Eliminates The Middle Man (temp file IO operations) n Provides fine-grained, ABL level of control –Allows on the fly data manipulation –Useful when merging databases n Can use with remote connections to bridge version numbers. n In OE10.1 performance is essentially equal n Cannot use RAW-TRANSFER with -RO

27 DB-20: Highly Parallel Dump and Load - 27 The Code n start.p n dlctl.p n dlclear.p n dlwrap.p n dlx.p

28 DB-20: Highly Parallel Dump and Load - 28 start.p /* start.p * */ run./dotp/dlclear.p. pause 1 no-message. run./dotp/dlctl.p. pause 1 no-message. run./dotp/dlmon.p. return.

29 DB-20: Highly Parallel Dump and Load - 29 dlclear.p /* dlclear.p * */ define new global shared variable dlstart as character no-undo format "x(20)". dlstart = "". for each dlctl exclusive-lock: assign dlctl.dumped = 0 dlctl.loaded = 0 dlctl.err_count = 0 dlctl.note = "" dlctl.xstatus = "". end. return.

30 DB-20: Highly Parallel Dump and Load - 30 dlctl.p /* dlctl.p * */ for each dlctl no-lock where dlctl.active = yes by dlctl.expected descending: os-command silent value( "mbpro -pf dl.pf -p./dotp/dlwrap.p -param " + '"' + string( dlctl.thread ) + "|" + dlctl.tblname + "|" + dlctl.criteria +"|" + dlctl.pname + '"' + " >> /tmp/dl/error.log 2>&1 ~&" ). end. return.

31 DB-20: Highly Parallel Dump and Load - 31 dlwrap.p /* dlwrap.p * */ define variable thr as character no-undo. define variable tbl as character no-undo. define variable cri as character no-undo. define variable pnm as character no-undo. thr = entry( 1, session:parameter, "|" ). tbl = entry( 2, session:parameter, "|" ). cri = entry( 3, session:parameter, "|" ). pnm = entry( 4, session:parameter, "|" ). if pnm = "" then pnm = "dlx". pnm = "./dotp/" + pnm + ".p". run value( pnm ) value( thr ) value( tbl ) value( cri ). return.

32 DB-20: Highly Parallel Dump and Load - 32 dlx.p - part 1 /* {1} = thread number * {2} = table name * {3} = WHERE clause */ define variable flgname as char no-undo initial /tmp/dl/{2}.{1}.flg". define variable logname as char no-undo initial "/tmp/dl/{2}.{1}.log". define stream unloaded. output stream unloaded to value( "/tmp/dl/{2}.{1}.d" ). output to value( logname ) unbuffered. put today " " string( time, "hh:mm:ss" ) " {2} {1}" skip. disable triggers for dump of src.{2}. disable triggers for load of dst.{2}. define query q for src.{2}. open query q for each src.{2} no-lock {3}.

33 DB-20: Highly Parallel Dump and Load - 33 dlx.p - part 2 outer_loop: do for dlctl, dst.{2} while true transaction: inner_loop: do while true: get next q no-lock. if not available src.{2} then leave outer_loop. /* end of table */ create dst.{2}. buffer-copy src.{2} to dst.{2} no-error. d = d + 1. if error-status:num-messages = 0 then l = l + 1. else do: delete dst.{2}. e = e + 1. export stream unloaded src.{2}. find dl.dlctl exclusive-lock where dlctl.tblname = "{2}" and dlctl.thread = {1}. dlctl.err_count = e. next inner_loop. end.

34 DB-20: Highly Parallel Dump and Load - 34 dlx.p - part 3 if d modulo 100 = 0 then do: find dl.dlctl exclusive-lock where dlctl.tblname = "{2}" and dlctl.thread = {1}. assign dlctl.dumped = d dlctl.loaded = l. file-info:file-name = flgname. if file-info:full-pathname = ? then do: dlctl.note = "stopped". leave inner_loop. end. leave inner_loop. end. end. /* inner_loop */ end. /* outer_loop */

35 DB-20: Highly Parallel Dump and Load - 35 dlx.p - part 4 do for dl.dlctl transaction: find dl.dlctl exclusive-lock where dlctl.tblname = "{2}" and dlctl.thread = {1}. assign dlctl.dumped = d dlctl.loaded = l dlctl.err_count = e dlctl.xstatus = "done" dlctl.note = ( if dlctl.note <> "stopped" then "complete" else "stopped" ). if dlctl.note <> "stopped" then do: if d < dlctl.expected then dlctl.note = "short". if d > dlctl.expected then dlctl.note = "long". end. if e > 0 then dlctl.note = dlctl.note + "+errors". end.

36 DB-20: Highly Parallel Dump and Load - 36 Validation n Belt & Braces!!! –Compare Record Counts n dlcompare.p (for binary d&l) –Check Logs for Known & Unknown Errors n grep –i –f error.list n fail, error, (1124) –Check for Success n grep “ 0 errors” /tmp/dl/*.log | wc –l n grep “ dumped” /tmp/dl/*.log n grep “ loaded” /tmp/dl/*.log n ls –l /tmp/dl/*.d

37 DB-20: Highly Parallel Dump and Load - 37 Post Load Monitoring n Watch for very rapid growth: –May be a rows-per-block assignment error (i.e. data areas with RPB = 1). –May be a cluster-size error (i.e. index areas with 512 blocks per cluster). –May be an unannounced application change. n It is not unusual for there to be somewhat elevated IO for a short period of time (perhaps a week or two).

38 DB-20: Highly Parallel Dump and Load - 38 Tip Sheet! n Don’t rely on announcements to keep users out of your system! n Shut down unnecessary services. –Don’t forget cron jobs! n Disable remote access. n Rename the database, the directory, the server… n Take the server off the network. n Always perform NON-DESTRUCTIVE Dump & Load!

39 DB-20: Highly Parallel Dump and Load - 39 Recap n Excellent Reasons to Dump & Load: –Move to a new platform! –Reconfigure a storage configuration!! –Take advantage of a hot new db features!!! –Especially Type 2 Areas!!!! n How To Get There Safely… n … and Quickly!

40 DB-20: Highly Parallel Dump and Load - 40 Questions


Download ppt "DB-20: Highly Parallel Dump & Load Tom Bascom VP Technology White Star Software"

Similar presentations


Ads by Google