Architectural Impact of Stateful Networking Applications Javier Verdú, Jorge García Mario Nemirovsky, Mateo Valero The 1st Symposium on Architectures for.

Slides:



Advertisements
Similar presentations
Tuning of Loop Cache Architectures to Programs in Embedded System Design Susan Cotterell and Frank Vahid Department of Computer Science and Engineering.
Advertisements

DBMSs on a Modern Processor: Where Does Time Go? Anastassia Ailamaki Joint work with David DeWitt, Mark Hill, and David Wood at the University of Wisconsin-Madison.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
POLITECNICO DI MILANO Parallelism in wonderland: are you ready to see how deep the rabbit hole goes? ILP: VLIW Architectures Marco D. Santambrogio:
New Directions in Traffic Measurement and Accounting Cristian Estan – UCSD George Varghese - UCSD Reviewed by Michela Becchi Discussion Leaders Andrew.
Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.
Accurately Approximating Superscalar Processor Performance from Traces Kiyeon Lee, Shayne Evans, and Sangyeun Cho Dept. of Computer Science University.
Limits on ILP. Achieving Parallelism Techniques – Scoreboarding / Tomasulo’s Algorithm – Pipelining – Speculation – Branch Prediction But how much more.
Enabling Efficient On-the-fly Microarchitecture Simulation Thierry Lafage September 2000.
CacheCast: Eliminating Redundant Link Traffic for Single Source Multiple Destination Transfers Piotr Srebrny, Thomas Plagemann, Vera Goebel Department.
1 “Tracking the Evolution of Web Traffic: Felix Hernandez-Campos, Kevin Jeffay, F. Donelson Smith IEEE/ACM International Symposium on Modeling,
Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached Bohua Kou Jing gao.
Memory System Characterization of Big Data Workloads
Glenn Reinman, Brad Calder, Department of Computer Science and Engineering, University of California San Diego and Todd Austin Department of Electrical.
1 13-Jun-15 S Ward Abingdon and Witney College LAN design CCNA Exploration Semester 3 Chapter 1.
WCED: June 7, 2003 Matt Ramsay, Chris Feucht, & Mikko Lipasti University of Wisconsin-MadisonSlide 1 of 26 Exploring Efficient SMT Branch Predictor Design.
Register Packing Exploiting Narrow-Width Operands for Reducing Register File Pressure Oguz Ergin*, Deniz Balkan, Kanad Ghose, Dmitry Ponomarev Department.
Reducing the Complexity of the Register File in Dynamic Superscalar Processors Rajeev Balasubramonian, Sandhya Dwarkadas, and David H. Albonesi In Proceedings.
Architectural Impact of SSL Processing Jingnan Yao.
1 Emulating AQM from End Hosts Presenters: Syed Zaidi Ivor Rodrigues.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
K. Salah1 An Analytical Tool to Assess Readiness of Existing Networks for Deploying IP Telephony K. Salah & M. Almashari Department of Information and.
The Memory Behavior of Data Structures Kartik K. Agaram, Stephen W. Keckler, Calvin Lin, Kathryn McKinley Department of Computer Sciences The University.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
Testing Intrusion Detection Systems: A Critic for the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory By.
Exploring the Tradeoffs of Configurability and Heterogeneity in Multicore Embedded Systems + Also Affiliated with NSF Center for High- Performance Reconfigurable.
Achieving Load Balance and Effective Caching in Clustered Web Servers Richard B. Bunt Derek L. Eager Gregory M. Oster Carey L. Williamson Department of.
A Low-Complexity, High-Performance Fetch Unit for Simultaneous Multithreading Processors Ayose Falcón Alex Ramirez Mateo Valero HPCA-10 February 18, 2004.
NetworkProtocols. Objectives Identify characteristics of TCP/IP, IPX/SPX, NetBIOS, and AppleTalk Understand position of network protocols in OSI Model.
Lecture 2 TCP/IP Protocol Suite Reference: TCP/IP Protocol Suite, 4 th Edition (chapter 2) 1.
Waleed Alkohlani 1, Jeanine Cook 2, Nafiul Siddique 1 1 New Mexico Sate University 2 Sandia National Laboratories Insight into Application Performance.
Global NetWatch Copyright © 2003 Global NetWatch, Inc. Factors Affecting Web Performance Getting Maximum Performance Out Of Your Web Server.
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
Korea Univ B-Fetch: Branch Prediction Directed Prefetching for In-Order Processors 컴퓨터 · 전파통신공학과 최병준 1 Computer Engineering and Systems Group.
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
Chap 9 TCP/IP Andres, Wen-Yuan Liao Department of Computer Science and Engineering De Lin Institute of Technology
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Statistical Simulation of Superscalar Architectures using Commercial Workloads Lieven Eeckhout and Koen De Bosschere Dept. of Electronics and Information.
NetFlow: Digging Flows Out of the Traffic Evandro de Souza ESnet ESnet Site Coordinating Committee Meeting Columbus/OH – July/2004.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
1 Instruction Sets and Beyond Computers, Complexity, and Controversy Brian Blum, Darren Drewry Ben Hocking, Gus Scheidt.
1 LAN design- Chapter 1 CCNA Exploration Semester 3 Modified by Profs. Ward and Cappellino.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Predicting Coherence Communication by Tracking Synchronization Points at Run Time Socrates Demetriades and Sangyeun Cho 45 th International Symposium in.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Bandwidth Estimation Workshop 2003 Evaluating pathrate and pathload with realistic cross-traffic Ravi Prasad Manish Jain Constantinos Dovrolis (ravi, jain,
Performance Analysis of the Compaq ES40--An Overview Paper evaluates Compaq’s ES40 system, based on the Alpha Only concern is performance: no power.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
3/12/2013Computer Engg, IIT(BHU)1 CONCEPTS-1. Pipelining Pipelining is used to increase the speed of processing It uses temporal parallelism In pipelining,
Workload Design: Selecting Representative Program-Input Pairs Lieven Eeckhout Hans Vandierendonck Koen De Bosschere Ghent University, Belgium PACT 2002,
Investigating the Prefix-level Characteristics A Case Study in an IPv6 Network Department of Computer Science and Information Engineering, National Cheng.
Sunpyo Hong, Hyesoon Kim
On the Importance of Optimizing the Configuration of Stream Prefetches Ilya Ganusov Martin Burtscher Computer Systems Laboratory Cornell University.
Application Domains for Fixed-Length Block Structured Architectures ACSAC-2001 Gold Coast, January 30, 2001 ACSAC-2001 Gold Coast, January 30, 2001.
1 IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo, Jose G. Delgado-Frias Publisher: Journal of Systems.
Querying the Internet with PIER CS294-4 Paul Burstein 11/10/2003.
Performance Comparison of Ad Hoc Network Routing Protocols Presented by Venkata Suresh Tamminiedi Computer Science Department Georgia State University.
MicroGrid Update & A Synthetic Grid Resource Generator Xin Liu, Yang-suk Kee, Andrew Chien Department of Computer Science and Engineering Center for Networked.
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Network Processing Systems Design
Introduction to parallel programming
Data Streaming in Computer Networking
Phase Capture and Prediction with Applications
Christophe Dubach, Timothy M. Jones and Michael F.P. O’Boyle
2019/1/1 High Performance Intrusion Detection Using HTTP-Based Payload Aggregation 2017 IEEE 42nd Conference on Local Computer Networks (LCN) Author: Felix.
rePLay: A Hardware Framework for Dynamic Optimization
Phase based adaptive Branch predictor: Seeing the forest for the trees
Presentation transcript:

Architectural Impact of Stateful Networking Applications Javier Verdú, Jorge García Mario Nemirovsky, Mateo Valero The 1st Symposium on Architectures for Networking and Communications Systems Princeton, New Jersey, USA October 26-28, 2005 ANCS - I

Architectural Impact of Stateful Networking Applications2 Trends of Internet r Important growth of Internet Traffic r Consequent Traffic Aggregation increment Low packet/flow temporal locality r End-point routers & appliances execute stateful apps r Upper layer packet processing Larger workloads per packet r Facing new security issues r Improvement of attacks methods Need to spread the knowledge futher than a packet

Architectural Impact of Stateful Networking Applications3 Granularity Levels … Holding Company Department User Application Flow Packet Stateful Application Model Application - + State Lifetime Packet Flow User Company Department

Architectural Impact of Stateful Networking Applications4 Research Limitations on Stateful Apps r Pool of Benchmark Suites for Network Processors r CommBench r NetBench r NpBench r NPForum r Lack of Stateful Benchmarks r Most of them are stateless benchmarks r Creating new benchmarks r Reliability??? State size State management

Architectural Impact of Stateful Networking Applications5 Talk Outline r Introduction r Network Traffic Properties r Description of Environment r Architectural Impact Analysis r Summary

Architectural Impact of Stateful Networking Applications6 Network Traffic Properties r Traffic Aggregation Level r Unique Flow rate in a given window vs

Architectural Impact of Stateful Networking Applications7 Network Traffic Properties r Traffic Aggregation Level r Unique Flow rate in a given window r Intra-Flow Temporal Distribution r How the packets are exchanged? vs

Architectural Impact of Stateful Networking Applications8 Network Traffic Properties r Traffic Aggregation Level r Unique Flow rate in a given window r Intra-Flow Temporal Distribution r How the packets are exchanged? r Inter-Flow Temporal Distribution r Packet rate between packets of the same flow vs

Architectural Impact of Stateful Networking Applications9 r Snort is tuned with four different configurations r Stream4 Prevents Stick/Snot attacks r Flow-Portscan Detects portscanning attacks r SfPortscan Detects a variety of portscanning attacks r Merged Engines The combination of the above engines r Argus is a monitoring/billing benchmark r Currently it is included in NO benchmark suite r Open source application r Equivalent to the commercial tool Cisco NetFlow Benchmark Selection (I)

Architectural Impact of Stateful Networking Applications10 r Obviously, stateless applications keep no flowstate r The state size may vary a lot between applications r The state management also may be quite different Benchmark Selection (& II)

Architectural Impact of Stateful Networking Applications11 Evaluation Methodology r Instrumented Binary Code: ATOM r Trace-driven simulation: Modified version of SMTSim Simulator r Simulation length r Warming period 10K Packets r Processing period 50K Packets r Packet selection for the flow lifetime studies r Towards analysis of actual application behavior r The baseline is an ample configuration ROB Size 256 entries –No significant improvements with larger ROBs Physical Regs: 192 int, 192 FP –No stress due to lack of regs Perceptron Branch Predictor –The most powerful configuration 64KB I$, 64KB DL1$, 2MB L2$ –No significant improvements with larger caches

Architectural Impact of Stateful Networking Applications12 Architectural Impact Analysis r Computational complexity r Available Parallelism r Impact of Bottlenecks r Branch Prediction r Data Cache Behavior

Architectural Impact of Stateful Networking Applications13 Computational Complexity (I) r There are no significant differences among benchmarks r Roughly 35% - 45% of memory accesses r Argus is more memory intesive

Architectural Impact of Stateful Networking Applications14 Computational Complexity (& II) r The instruction mix is similar along all the packets r Some applications generate the hardest workload in the first packets r Other applications show almost constant workload

Architectural Impact of Stateful Networking Applications15 Available Parallelism r Processor configuration modified towards avoiding any constraint r The ILP is independent of the app category r It is inherent to the application itself r The evaluated apps show low ILP: ~3,7 IPC

Architectural Impact of Stateful Networking Applications16 Impact of Bottlenecks r Stateful apps show very lower performance r Roughly 0,6 IPC on average r The importance of the packet processing r Constant vs concentrated workload r Memory Impact r 3x – 19x of speed up

Architectural Impact of Stateful Networking Applications17 Branch Prediction (I) r High branch prediction accuracy on average r But we have two branch categories r Flow independent: similar among packets -> easy to predict r Flow dependent: flow related -> sensitive to traffic properties

Architectural Impact of Stateful Networking Applications18 Branch Prediction (& II) r A single active connection r Higher accuracy and no variations among n-th packets r High traffic aggregation level r Lower accuracy and vairations among n-th packets r Negative aliasign due to flow dependent branches r Most of our applications hide this effect due to concentrated workload No traffic aggregation levelHigh traffic aggregation level

Architectural Impact of Stateful Networking Applications19 Data Cache Behavior (I) r Stateful apps need reduced DL1$ to get steady miss rate r Taking advantage of flow independent memory references r Almost 100% of DL2$ accesses are misses r It is unable to keep the state of the active flows r Larger flow-states emphasize network properties impact r Getting higher steady state even with low traffic aggregation r The intra-flow distribution may be more helpful

Architectural Impact of Stateful Networking Applications20 Data Cache Behavior (& II) r Negative effects of the memory concentrated in the first packets r Constant workload applications show similar miss rate for every packet r Extra miss rates for data structures maintainance r Merged Engines from 1,5% to 5% on average

Architectural Impact of Stateful Networking Applications21 Summary (I) r We present the architectural impact of Stateful Networking Applications r An important new type of applications r The behavior along the packets of a TCP connection r Constant workload for the packets of a connection r Workload concentrated in the first packets of a connection r Analysis of network traffic properties r Branch prediction and data cache are sensitive to them

Architectural Impact of Stateful Networking Applications22 Summary (& II) r Reduced IPC on average r L2$ is unable to maintain the required states of active flows r Branch prediction also may improve once solved memory bottleneck r Other stateful applications may present different valuable results, but… r The critical bottlenecks even may be more stressed r Our concern is … r To have more sample applications to evaluate r To analyse the apps in a more realistic environment Running simultaneously a number of applications

Architectural Impact of Stateful Networking Applications23 Questions...

Architectural Impact of Stateful Networking Applications24 Traffic Traces r Filtered Traffic Trace r Bidirectional TCP connections r Generating Synthetic Traffic Traces r Mixing different traffic traces microTimestamp sorting based r We are assuming a set of traces with the same bandwidth link In our case: MRA link r Avoiding the aliasing of IP addresses among aggregated traces The set of traces are originally sanitized r The resulting traffic trace shows roughly 1Gbps r 170K active flows Achieved from the original OC12 MRA link ( 622Mbps)