Presentation is loading. Please wait.

Presentation is loading. Please wait.

CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables.

Similar presentations


Presentation on theme: "CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables."— Presentation transcript:

1

2 CapEx + OpEXOpEx

3 Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables

4 AsynchronousSynchronous Blocking Buffers

5

6 I've always believed in numbers and the equations and logics that lead to reason. But after a lifetime of such pursuits, I ask, “What truly is logic?” “Who decides reason?” My quest has taken me through the physical, the metaphysical, the delusional -- and back.

7 Late arriving dimensions Denormalised dataAlternativesSlowly changing dimensionsOne to many lookups

8 Owner FirstName MiddleNames Surname Item NameCollectionName ValidFrom ValidTo

9

10 4,432,277 rows

11 01:24.672

12 31.547 sec

13 52.141 sec

14 Foreach RowSend Row on Dimension partial-cache lookup successful? Dimension full-cache lookup successful? Set resolution from full-cache Set resolution with value in dimension Add value to dimension Set resolution from partial-cache BS6224:1987

15 > 4 MINUTES!!!!! 4,432,277 0 1,401,849 3,030,428 But there are only 100 Owners!

16 Full cache lookupPartial cache lookup OLEDB To Write Record No match Match

17 Full cache lookupPartial cache lookup OLEDB To Write Record No match Match

18 Full cache lookupPartial cache lookup OLEDB To Write Record No match Match

19 Full cache lookupPartial cache lookup OLEDB To Write Record No match Match

20 Cache lookup not successful? Update dimension & cache Preload cacheForeach RowSend Row on Update Row from cache BS6224:1987

21

22 Dictionary Immutable ObjectConfiguration

23 Inputs

24 Outputs

25 “Clean Code” Robert C. Martin (Uncle Bob)

26 Connection Managers

27 Pre Execute private readonly Dictionary cache = new Dictionary (); Key’s properties set through constructorKey’s properties are READ ONLY {get; private set;}Need to override GetHashCode() & Equals() of Key

28

29

30

31

32

33 Pre Execute

34 24.547 sec 22%

35 Pre Execute private readonly Dictionary > cache = new Dictionary >(); Output is Asynchronous Output row count per input row = Number of values retrieved from dictionary Input columns copied to output

36 Lookup 1 Lookup 2 Union All Default value fx Matched Not matched Matched Not matched Lookup 1 Lookup 2 Default value fx Not matched Matched Merge

37 Lookup 3 Each lookup adds its own valueConditionally apply slow lookupsSelect most appropriate result CoalesceLookup 2Lookup 1

38 DistinctSortAggregate xx

39 Cache lookup not successful? Add to cache Write cache out Start End Create empty cache for rows Foreach row Create copy of row BS6224:1987

40 Cache lookup not successful? Add to cache Write cache out Start End Create empty cache for rows Foreach row Create copy of row BS6224:1987 28%

41 Is all of the data required?Partitioning is your friend! 4,432,277 rows 44,323 rows x100

42 Owner Id Collection Id Item Id Valid From Valid To Partition Key Sort KeySort Data SortKey implements IComparable & IComparable Dictionary points to IEnumerable 1Many

43 BS6224:1987 Start End Foreach row Create Partition Key for row Create Sort Key for row Create Sort Data for row Is partition key of row different to current key Write out then clear sorter Set current key to partition key of row Add sort key & data to sorter Write out then clear sorter

44 Pre Execute public override Process_Input0(Input0Buffer buffer) { while (buffer.NextRow()) { ProcessRow(buffer); } if (buffer.EndOfRowSet()) { WriteoutSorter(); } private readonly SortedDictionary > sorter = new SortedDictionary< SortKey,IEnumerable >(); private PartitionKey currentPartitionKey = null;

45 Pre Execute Structure identical to sort exceptOne to one map between key and data xx Data is now MUTABLE - Updated as aggregation progresses

46

47 38.890 sec (From 01:24.672) 54%

48 Helping out SQL Server

49

50 25 HOURS! (250 million rows)

51 Read Keys

52 Partitioned Left Hash Lookup

53 Read Keys Partitioned Left Hash Lookup Partitioned Right Hash Lookup

54 Read Keys Partitioned Left Hash Lookup Partitioned Right Hash Lookup Partitioned Sort (to cluster key for data) Read Data (via cluster key) Merge Join 90 minutes! (250 million rows)

55 Bulk Loading Partitioned Tables P Partitioned Table Data Source Transform

56 Data Source ………………………. Switch-in Tables Physical partitions (sort order) Time based partitioning

57

58

59 Data Source Batch Manager ActionBlock (multi threaded wrapper around a table writer utilising SqlBulkCopy) BatchBlocks Configure: Size of batches Number of writer threads TPL component Custom code

60

61

62

63 KISSKISS eep t hort imple New tools available but one size does not fit allUtilise partitioningYou have the power

64

65 So, it is possible to move from the Asynchronous to(wards) the Synchronous


Download ppt "CapEx + OpEXOpEx Pipelines Sources SQL Server Transformations LookupsFull Blockers Destinations Partitioned Tables."

Similar presentations


Ads by Google