Presentation is loading. Please wait.

Presentation is loading. Please wait.

NoC Symposium07 Panel Proliferating the Use and Acceptance of NoC Benchmark Standards NoC Symposium07 Panel Proliferating the Use and Acceptance of NoC.

Similar presentations


Presentation on theme: "NoC Symposium07 Panel Proliferating the Use and Acceptance of NoC Benchmark Standards NoC Symposium07 Panel Proliferating the Use and Acceptance of NoC."— Presentation transcript:

1 NoC Symposium07 Panel Proliferating the Use and Acceptance of NoC Benchmark Standards NoC Symposium07 Panel Proliferating the Use and Acceptance of NoC Benchmark Standards Timothy M. Pinkston National Science Foundation (NSF) tpinksto@nsf.gov University of Southern California (USC) tpink@usc.edu

2 Applications Implementation (Circuit) Technology Demand for System Functions Hardware Functional Blocks Performance of System Functions Demand for Functional Blocks define what system functions should be supported workloads defines how system functions are supported defines the extent to which desired system functions can be implemented in hardware Architecture Driving Forces Arch Tech Apps Alg & SW Trends Towards On-chip Networked Microsystems, T. Pinkston and J. Shin, IJHPCN. (http://ceng.usc.edu/smart/publications/archives/CENG-2004-17.pdf)

3 SPLASH SPEC CPU CPU2 A sampling of benchmark suites already out there: A sampling of benchmark suites already out there: Need for a NoC Benchmark Suite BAPCo SYSmark Netperf MediaBench STREAM EEMBC ALPBench MiBench LAPACK Dhry-/Whetstone LINPACK SparseBench ScaLAPACK NPB (NAS PB) HPL LFK (Livermore) LMBench BioBench CommBench GraalBench NPCryptBench BYTEmark LLCbench DMABench Do we really need yet another benchmark suite? Do we really need yet another benchmark suite? -2 Is There a ? Gen-Purpose/PCEmbedded/SoCSci-Eng/HPC -2006

4 A set of standard workloads/benchmarks and evaluation methods are needed to enable realistic evaluation and uniform (fair) comparison between various approaches A set of standard workloads/benchmarks and evaluation methods are needed to enable realistic evaluation and uniform (fair) comparison between various approaches December 2006 NSF OCIN Workshop Recommendations (www.ece.ucdavis.edu/~ocin06) Need for cooperation (agreement) between academia and industry Need for cooperation (agreement) between academia and industry Need for qualified performance metrics: latency and bandwidth under power, energy, thermal, reliability, area, etc., constraints Need for qualified performance metrics: latency and bandwidth under power, energy, thermal, reliability, area, etc., constraints Need for standardization of metrics: clear definition of what is being represented by metrics (e.g., network latency, throughput,...) Need for standardization of metrics: clear definition of what is being represented by metrics (e.g., network latency, throughput,...) Need for effective alternatives to time consuming full-system execution-driven simulation, including use of microbenchmarks, parameterized synthetic traffic/workloads, traces, etc. Need for effective alternatives to time consuming full-system execution-driven simulation, including use of microbenchmarks, parameterized synthetic traffic/workloads, traces, etc. Need for accurate characterization and modelling of system traffic behavior across various domains: general-purpose & embedded Need for accurate characterization and modelling of system traffic behavior across various domains: general-purpose & embedded Need for analytical methods (complementary to simulation) to explore and quantitatively narrow-down the large design space Need for analytical methods (complementary to simulation) to explore and quantitatively narrow-down the large design space Challenges in Computer Architecture Evaluation, K. Skadron, M. Martonosi, D. August, M. Hill, D. Lilja, V. Pai, in IEEE Computer, pp. 30-36, August 2003.

5 Meaning of Latency and Throughput Simulation: 3-D Torus, 4,096 nodes (16 х 16 х 16), uniform traffic load, virtual cut-through switching, three-phase arbitration, 2 and 4 virtual channels. Bubble flow control is used in dimension order on one virtual channel; the other virtual channel(s) is supplied in dimension order (deterministic routing) or along any shortest path to destination (adaptive routing). Latency: fabric only, endnode-to-endnode, ave., no-load, saturation? Latency: fabric only, endnode-to-endnode, ave., no-load, saturation? Throughput: peak, sustained, saturation, best-case, worst-case? Throughput: peak, sustained, saturation, best-case, worst-case?

6 Latency = Sending latency + T LinkProp x (d+1) + (T r + T a + T s ) x d + + Receiving latency Packet + (d x Header) Bandwidth (cut-through switching) lower bound (contention delay not included) BW Network Effective bandwidth = min(N × BW LinkInjection,, × N × BW LinkReception ) × BW Bisection upper bound (contention delay not fully included) H&P Int.Net. chapter: ceng.usc.edu/smart/slides/appendixE.html H&P Int.Net. chapter: ceng.usc.edu/smart/slides/appendixE.html Network traffic pattern/load determine &, traffic-dependent parameters Network traffic pattern/load determine &, traffic-dependent parameters Topology and switch architecture determine d, T r, T a, T s, BW Bisection Topology and switch architecture determine d, T r, T a, T s, BW Bisection Routing, switching, FC, arch, etc., influence network efficiency factor, Routing, switching, FC, arch, etc., influence network efficiency factor, internal switch speedup & reduction of contention within switches internal switch speedup & reduction of contention within switches buffer organizations to mitigate HOL blocking in and across switches buffer organizations to mitigate HOL blocking in and across switches balance load across network links & maximally utilize link bandwidth balance load across network links & maximally utilize link bandwidth = L x R x A x S x Arch x …, architecture-dependent parameters = L x R x A x S x Arch x …, architecture-dependent parameters Simple (Analytical) Latency and Throughput Models

7 Modeling Throughput of Cell BE EIB (Worst-Case) Injection bandwidth: 25.6 GB/s per element Network injection Reception bandwidth: 25.6 GB/s per element Network reception Aggregate bandwidth Command Bus Bandwidth (12 Nodes) (4 rings each with 12 links) (12 Nodes) BW Network = ρ × 204.8 / 1 GB/s = 78 GB/s (measured) 307.2 GB/s BW Bisection = 8 links = 204.8 GB/s 204.8 GB/s 307.2 GB/s 1,228.8 GB/s Traffic pattern: determines & = 1 ρ=38% limited, at best, to only 50% due to ring interferrence Peak BW Network of 25.6 GB/s x 3 x 4 307.2 GB/s (3 transfers per ring) BW Network = ρ × BW Bisection / = 1

8 Integer Programs Floating-Point Programs Ref: Hennessy & Patterson, Computer Architecture: A Quantitative Approach, 4 th Ed.

9 What are the hallmarks of successful benchmark suites? What are the hallmarks of successful benchmark suites? In Conclusion: Answers to Panel Questions Fairness: represent the proper workload behavior/characteristics Fairness: represent the proper workload behavior/characteristics Portability: open, free access, not architecture/vendor-specific Portability: open, free access, not architecture/vendor-specific Transparency: yield reproducible performance results (reporting) Transparency: yield reproducible performance results (reporting) Evolutionary: adaptable over time in composition and reporting Evolutionary: adaptable over time in composition and reporting How can industry and academia facilitate use? How can industry and academia facilitate use? What are the main obstacles to establishing a de facto NoC standard benchmark suite, and how to address? What are the main obstacles to establishing a de facto NoC standard benchmark suite, and how to address? Establish need/importance for common evaluation best-practices Establish need/importance for common evaluation best-practices Cross-cutting effort: architects, circuit designers, CAD researchers Cross-cutting effort: architects, circuit designers, CAD researchers Need to place high value on developing and using eval. standards Need to place high value on developing and using eval. standards Capturing the diversity of NoC applications & computing domains Capturing the diversity of NoC applications & computing domains Red herrings converge on performance evaluation standards and agree on characteristic traffic loads and/or microbenchmarks Red herrings converge on performance evaluation standards and agree on characteristic traffic loads and/or microbenchmarks Ultimately, system-level performance is important, not component Ultimately, system-level performance is important, not component


Download ppt "NoC Symposium07 Panel Proliferating the Use and Acceptance of NoC Benchmark Standards NoC Symposium07 Panel Proliferating the Use and Acceptance of NoC."

Similar presentations


Ads by Google