Presentation is loading. Please wait.

Presentation is loading. Please wait.

SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION MARC S. ORR †§, SHUAI CHE §, AYSE YILMAZER §, BRADFORD M. BECKMANN §, MARK D. HILL †§, DAVID A. WOOD †§ †

Similar presentations


Presentation on theme: "SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION MARC S. ORR †§, SHUAI CHE §, AYSE YILMAZER §, BRADFORD M. BECKMANN §, MARK D. HILL †§, DAVID A. WOOD †§ †"— Presentation transcript:

1 SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION MARC S. ORR †§, SHUAI CHE §, AYSE YILMAZER §, BRADFORD M. BECKMANN §, MARK D. HILL †§, DAVID A. WOOD †§ † UW-MADISON, § AMD RESEARCH ASPLOS, MARCH 16, 2015

2 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 20152 EXECUTIVE SUMMARY All Global Synchronization Scoped Synchronization Work Stealing Best of Both? NEW: Remote-Scope Promotion (7% Speedup)(18% Speedup) (25% Speedup) Heterogeneous chips, like GPUs, have hierarchical memories

3 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 20153 OUTLINE  Background: Synchronization + Scopes  Synchronization using Remote-Scope Promotion  Results/Conclusion

4 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 20154 BACKGROUND: SYNCHRONIZATION + SCOPES  Parallel Synchronization semantics ‒acquire: pull latest data (to me) ‒release: push latest data (to others)  Scopes bound synchronization: ‒Smaller scope  less synchronization overhead

5 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 20155 wg scope0wg scope1 ACQUIRE/RELEASE ANIMATION void incX_ workgroup () { } while (!CAS_ acq_wg (&L, 0, 1)); X = X + 1; st_ rel_wg (&L, 0); void incX_ component () { } while (!CAS_ acq_cmp (&L, 0, 1)); X = X + 1; st_ rel_cmp (&L, 0); component scope L1 Cache L2 X = 2 L = 0 CU0CU1 X = 1 L = 1 X = 3 L = 0 L = 1 X = 4 L = 0

6 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 20156 SCOPED SYNCHRONIZATION’S STRENGTHS Static local sharingDynamic global sharing component scope wg_scope0 data 0 wg_scope1 data 1 wg scope0 global data store wg scope1 On current hardware, wg scope can yield >20% speedup over cmp scope

7 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 20157 SCOPED SYNCHRONIZATION’S LIMITATIONS  Dynamic local sharing: some threads access shared data less frequently than others in an ad-hoc manner  Example: work stealing component scope queue 0 stale wg scope0 wg scope1 queue 1 enq deq queue 0

8 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 20158 OUTLINE  Background: Synchronization + Scopes  Synchronization using Remote-Scope Promotion  Results/Conclusion

9 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 20159 REMOTE-SCOPE PROMOTION  Insight: wg1 needs to trigger the promotion of scope 0  Contribution: hardware support for scope promotion & ISA instructions that utilize it component scope queue 0 stale wg_scope0 wg_scope 1 queue 1 queue 0 promote flush deq queue 0

10 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 201510  Prior memory models: HRF-direct, HRF-indirect ‒Invariant: acquire/release pair must occur at the same scope  Three new memory orders: st_rel_cmp(L, 0) PROMOTION SEMANTIC st(V,2) st_rel_wg(L, 0) cas_acq_wg(&L, 0, 1) ld(R1, V) work-item 0 (in wg 0)work-item 1 (in wg 1) OK cas_acq_cmp(&L, 0, 1) RACE! cas_rm_acq_cmp(&L, 0, 1) OK synchronizes-with relationship promotion remoteAcquirePromote the scope of last release to the scope of this acquire, then perform acquire remoteReleasePromote the scope of next acquire to the scope of this release, then perform release remoteAcquire+Releasecombine remote acquire & remote release st_rel_wg(L, 0)

11 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 201511 IMPLEMENTATION  remote_acq_cmp(L)  remote_rel_cmp(L) CU0 L1 Cache CU1 L1 Cache L2 L = 1 V = 3 V = 2 CU2 L1 Cache FLUSH V = 3 FLUSH L = 0 L = 1 promote 1.Promote the scope of the last release on L 2.Perform an acquire operation on L 1.Perform a release operation on L 2.Promote the scope of the next acquire on L

12 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 201512 IMPLEMENTATION DETAILS  Hardware Support ‒Sending/receiving sub-operations between CUs ‒Cache line locking to resolve races  Guarantee “coherence order” for read-modify-writes ‒Hardware support to stall new synchronization operations at target scope  Paper formalizes scope promotion ‒Shows that scope promotion is compatible with coherence order

13 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 201513 OUTLINE  Background: Synchronization + Scopes  Synchronization using Remote-Scope Promotion  Results/Conclusion

14 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 201514 METHODOLOGY  Prototyped remote scoped synchronization in gem5 ‒Extended with internal GPU model  Refactored 3 Pannotia workloads to retrieve graph nodes from task queues ‒SSSP, Color, PageRank (each run with 3-4 inputs)

15 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 201515 RESULTS scenarioScope of sync.?Work stealing? 1.07x1.18x1.25x baselineglobalno scope-onlylocalno steal-onlyglobalYes rem-synclocalYes

16 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 201516 CONCLUSION All Global Synchronization Scoped Synchronization Work Stealing NEW: Remote-Scope Promotion (7% Speedup)(18% Speedup) (25% Speedup) Best of Both!

17 Questions?

18 Backup

19 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 201519 µ BENCHMARK RESULTS  Scopes matter! Small tasks benefit from scopes

20 | SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION | MARCH 16, 201520 DISCLAIMER & ATTRIBUTION The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.


Download ppt "SYNCHRONIZATION USING REMOTE-SCOPE PROMOTION MARC S. ORR †§, SHUAI CHE §, AYSE YILMAZER §, BRADFORD M. BECKMANN §, MARK D. HILL †§, DAVID A. WOOD †§ †"

Similar presentations


Ads by Google