Download presentation

Presentation is loading. Please wait.

Published byGannon Aslin Modified over 3 years ago

1
Optimizing Matrix Multiplication with a Classifier Learning System Xiaoming Li (presenter) María Jesús Garzarán University of Illinois at Urbana-Champaign

2
Tuning library for recursive matrix multiplication Use cache-aware algorithms that take into account architectural features –Memory hierarchy –Register file, … Take into account input characteristics –matrix sizes The process of tuning is automatic.

3
Recursive Matrix Partitioning Previous approaches –Multiple recursive steps –Only divide by half A B

4
Recursive Matrix Partitioning Previous approaches: –Multiple recursive steps –Only divide by half A B Step 1:

5
Recursive Matrix Partitioning Previous approaches: –Multiple recursive steps –Only divide by half A B Step 2:

6
Recursive Matrix Partitioning Our approach is more general –No need to divide by half –May use a single step to reach the same partition –Faster and more general A B Step 1:

7
Our approach A general framework to describe a family of recursive matrix multiplication algorithms, where given the input dimensions of the matrices, we determine: –Number of partition levels –How to partition at each level An intelligent search method based on a classifier learning system –Search for the best partitioning strategy in a huge search space

8
Outline Background Partition Methods Classifier Learning System Experimental Results

9
Recursive layout framework Multiple levels of recursion –Takes into account the cache hierarchy 12345678 910111213141516 1718192021222324 2526272829303132 3334353637383940 4142434445464748 4950515253545556 5758596061626364

10
Recursive layout framework 12345678 910111213141516 1718192021222324 2526272829303132 3334353637383940 4142434445464748 4950515253545556 5758596061626364 Multiple levels of recursion –Takes into account the cache hierarchy

11
Recursive layout in our framework 12345678 910111213141516 1718192021222324 2526272829303132 3334353637383940 4142434445464748 4950515253545556 5758596061626364 Multiple levels of recursion –Takes into account the cache hierarchy

12
Recursive layout framework 12345678 910111213141516 1718192021222324 2526272829303132 3334353637383940 4142434445464748 4950515253545556 5758596061626364 Multiple levels of recursion –Takes into account the cache hierarchy

13
Recursive layout framework 125617182122 347819202324 910131425262930 1112151627283132 3334373849505354 3536394051525556 4142454657586162 4344474859606364 Multiple levels of recursion –Takes into account the cache hierarchy

14
Padding Necessary when the partition factor is not a divisor of the matrix dimension. 2000 Divide by 3

15
Padding Necessary when the partition factor is not a divisor of the matrix dimension. 2001 Divide by 3 667

16
Padding Necessary when the partition factor is not a divisor of the matrix dimension. 2001 Divide by 4 667

17
Padding Necessary when the partition factor is not a divisor of the matrix dimension. 2004 Divide by 4 668

18
Recursive layout in our framework Multiple level recursion –Support cache hierarchy Square tile rectangular tile –Fit non-square matrixes

19
Recursive layout in our framework Multiple level recursion –Support cache hierarchy Square tile rectangular tile –Fit non-square matrixes 9 8

20
Recursive layout in our framework Multiple level recursion –Support cache hierarchy Square tile rectangular tile –Fit non-square matrixes 10 8 Padding

21
Recursive layout in our framework Multiple level recursion –Support cache hierarchy Square tile rectangular tile –Fit non-square matrixes 3 4

22
Outline Background Partition Methods Classifier Learning System Experimental Results

23
Partition by Block (PB) –Specify the size of each tile –Example: Dimensions (M,N,K) = (100, 100, 40) Tile size (bm, bn, bk) = (50, 50, 20) Partition factors (pm, pn, pk) = (2,2,2) –Tiles need not to be square Two methods to partition matrices

24
Partition by Size (PS) –Specify the maximum size of the three tiles. –Maintain the ratios between dimensions constant –Example: (M,N,K) = (100, 100,50) Maximum tile size for M,N = 1250 (pm, pn, pk) = (2,2,1) –Generalization of the divide-by-half approach. Tile size = 1/4 * matrix size

25
Outline Background Partition Methods Classifier Learning System Experimental Results

26
Classifier Learning System Use the two partition primitives to determine how the input matrices are partitioned –Determine partition factors at each level f: (M,N,K) (pm i,pn i,pk i ), i=0,1,2 (only consider 3 levels) The partition factors depend on the matrix size –Eg. The partitions factors of a (1000 x 1000) matrix should be different that those of a (50 x 1000) matrix. The partition factors also depend on the architectural characteristics, like cache size.

27
Determine the best partition factors The search space is huge exhaustive search is impossible Our proposal: use a multi-step classifier learning system –Creates a table that given the matrix dimensions determines the partition factors

28
Classifier Learning System The result of the classifier learning system is a table with two columns Column 1 (Pattern): A string of 0, 1, and * that encodes the dimensions of the matrices Column 2 (Action): Partition method for one step –Built using the partition-by-block and partition-by- size primitives with different parameters.

29
Learn with Classifier System PatternAction (10***,11***)PS 100 …… (010**,011**)PB (4,4)

30
Learn with Classifier System PatternAction (10***,11***)PS 100 …… (010**,011**)PB (4,4) 5 bits / dim

31
Learn with Classifier System PatternAction (10***,11***)PS 100 …… (010**,011**)PB (4,4) 16 24

32
Learn with Classifier System PatternAction (10***,11***)PS 100 …… (010**,011**)PB (4,4) 16 24

33
Learn with Classifier System PatternAction (10***,11***)PS 100 …… (010**,011**)PB (4,4) 8 12

34
Learn with Classifier System PatternAction (10***,11***)PS 100 …… (010**,011**)PB (4,4) 8 12

35
Learn with Classifier System PatternAction (10***,11***)PS 100 …… (010**,011**)PB (4,4) 8 12

36
Learn with Classifier System PatternAction (10***,11***)PS 100 …… (010**,011**)PB (4,4) 4 4

37
How classifier learning algorithm works? Change the table based on the feedback of performance and accuracy from previous runs. Mutate the condition part of the table to adjust the range of matching matrix dimensions. Mutate the action part to find the best partition method for the matching matrices.

38
Outline Background Partition Methods Classifier Learning System Experimental Results

39
Experiments on three platforms –Sun UltraSparcIII –P4 Intel Xeon –Intel Itanium2 Matrices of sizes from 1000 x 1000 to 5000 x 5000

40
Algorithms Classifier MMM: our approach –Include the overhead of copying in and out of recursive layout ATLAS: Library generated by ATLAS using the search procedure without hand-written codes. –Has some type of blocking for L2 L1: One level of tiling – tile size: the same that ATLAS for L1 L2: Two levels of tiling –L1tile and L2tile: the same that ATLAS for L1

43
Conclusion and Future Work Preliminary results prove the effectiveness of our approach –Sun UltraSparcIII and Xeon: 18% and 5% improvement, respectively. –Itanium: -14% Need to improve padding mechanism –Reduce the amount of padding –Avoid unnecessary computation on padding

44
Thank you!

Similar presentations

OK

A Synergetic Approach to Throughput Computing on IA Chi-Keung (CK) Luk TPI/DPD/SSG Intel Corporation Nov 16, 2010.

A Synergetic Approach to Throughput Computing on IA Chi-Keung (CK) Luk TPI/DPD/SSG Intel Corporation Nov 16, 2010.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Download ppt on indus valley civilization government Ppt on team building stories Ppt on bond length and strength Ppt on review of related literature search Ppt on polynomials of 911 Ppt on testing of turbo generators Ppt on cross site scripting Ppt on fmcg sector in india in 2010 Ppt on windows 2000 operating system Ppt on asian continent outline