Exploiting Data Parallelism in SELinux Using a Multicore Processor Bodhisatta Barman Roy National University of Singapore, Singapore Arun Kalyanasundaram,

Slides:



Advertisements
Similar presentations
PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya
Advertisements

Systems and Technology Group © 2006 IBM Corporation Cell Programming Tutorial - JHD24 May 2006 Cell Programming Tutorial Jeff Derby, Senior Technical Staff.
JENNIS SHRESTHA CSC 345 April 22, Contents Introduction History Flux Advanced Security Kernel Mandatory Access Control Policies MAC Vs DAC Features.
An OpenCL Framework for Heterogeneous Multicores with Local Memory PACT 2010 Jaejin Lee, Jungwon Kim, Sangmin Seo, Seungkyun Kim, Jungho Park, Honggyu.
Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
SLA-Oriented Resource Provisioning for Cloud Computing
Parallelizing GIS applications for IBM Cell Broadband engine and x86 Multicore platforms Bharghava R, Jyothish Soman, K S Rajan International.
Ido Tov & Matan Raveh Parallel Processing ( ) January 2014 Electrical and Computer Engineering DPT. Ben-Gurion University.
Study of Hurricane and Tornado Operating Systems By Shubhanan Bakre.
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
Praveen Yedlapalli Emre Kultursay Mahmut Kandemir The Pennsylvania State University.
Distributed Processing, Client/Server, and Clusters
Chapter 16 Client/Server Computing Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
User Level Interprocess Communication for Shared Memory Multiprocessor by Bershad, B.N. Anderson, A.E., Lazowska, E.D., and Levy, H.M.
Figure 1.1 Interaction between applications and the operating system.
Synergistic Processing In Cell’s Multicore Architecture Michael Gschwind, et al. Presented by: Jia Zou CS258 3/5/08.
1 I/O Management in Representative Operating Systems.
CS 7810 Lecture 24 The Cell Processor H. Peter Hofstee Proceedings of HPCA-11 February 2005.
ADVANCED LINUX SECURITY. Abstract : Using mandatory access control greatly increases the security of an operating system. SELinux, which is an implementation.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Copyright 2013, Toshiba Corporation. DAC2013 Designer/User Track Scalability Achievement by Low-Overhead, Transparent Threads on an Embedded Many-Core.
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
ADLB Update Recent and Current Adventures with the Asynchronous Dynamic Load Balancing Library Rusty Lusk Mathematics and Computer Science Division Argonne.
Development in hardware – Why? Option: array of custom processing nodes Step 1: analyze the application and extract the component tasks Step 2: design.
Modularizing B+-trees: Three-Level B+-trees Work Fine Shigero Sasaki* and Takuya Araki NEC Corporation * currently with 1st Nexpire Inc.
Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, Katherine Yelick Lawrence Berkeley National Laboratory ACM International Conference.
Kenichi Kourai (Kyushu Institute of Technology) Takuya Nagata (Kyushu Institute of Technology) A Secure Framework for Monitoring Operating Systems Using.
Parallel Processing CS453 Lecture 2.  The role of parallelism in accelerating computing speeds has been recognized for several decades.  Its role in.
Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:
Providing Policy Control Over Object Operations in a Mach Based System By Abhilash Chouksey
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Operating Systems Lecture 02: Computer System Overview Anda Iamnitchi
March 12, 2007 Introduction to PS3 Cell BE Programming Narate Taerat.
Scalable Web Server on Heterogeneous Cluster CHEN Ge.
Programming Examples that Expose Efficiency Issues for the Cell Broadband Engine Architecture William Lundgren Gedae), Rick Pancoast.
Multi-core.  What is parallel programming ?  Classification of parallel architectures  Dimension of instruction  Dimension of data  Memory models.
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
Porting Irregular Reductions on Heterogeneous CPU-GPU Configurations Xin Huo, Vignesh T. Ravi, Gagan Agrawal Department of Computer Science and Engineering.
Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
Computer Science Lecture 7, page 1 CS677: Distributed OS Multiprocessor Scheduling Will consider only shared memory multiprocessor Salient features: –One.
Task Graph Scheduling for RTR Paper Review By Gregor Scott.
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,
Tetris Agent Optimization Using Harmony Search Algorithm
LYU0703 Parallel Distributed Programming on PS3 1 Huang Hiu Fung Wong Chung Hoi Supervised by Prof. Michael R. Lyu Department of Computer.
Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.
A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)
Maite Barroso - 10/05/01 - n° 1 WP4 PM9 Deliverable Presentation: Interim Installation System Configuration Management Prototype
Archictecture for MultiLevel Database Systems Jeevandeep Samanta.
Presented by Jeremy S. Meredith Sadaf R. Alam Jeffrey S. Vetter Future Technologies Group Computer Science and Mathematics Division Research supported.
Security-Enhanced Linux Stephanie Stelling Center for Information Security Department of Computer Science University of Tulsa, Tulsa, OK
Institute of Software,Chinese Academy of Sciences An Insightful and Quantitative Performance Optimization Chain for GPUs Jia Haipeng.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
OSCAR Symposium – Quebec City, Canada – June 2008 Proposal for Modifications to the OSCAR Architecture to Address Challenges in Distributed System Management.
1 ”MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs” John A. Stratton, Sam S. Stone and Wen-mei W. Hwu Presentation for class TDT24,
Auburn University COMP8330/7330/7336 Advanced Parallel and Distributed Computing Parallel Hardware Dr. Xiao Qin Auburn.
Ioannis E. Venetis Department of Computer Engineering and Informatics
GWE Core Grid Wizard Enterprise (
Introduction to Operating System (OS)
PA an Coordinated Memory Caching for Parallel Jobs
SELinux RHEL5: A benchmark
Storage Virtualization
Distributed System Structures 16: Distributed Structures
Using Packet Information for Efficient Communication in NoCs
Multithreaded Programming
Multicore and GPU Programming
Presentation transcript:

Exploiting Data Parallelism in SELinux Using a Multicore Processor Bodhisatta Barman Roy National University of Singapore, Singapore Arun Kalyanasundaram, Shrisha Rao International Institute of Information Technology Bangalore, India Computer Society of India, CSI 2012, Kolkata, India

Motivation One of the major drawbacks of Security:

Motivation One of the major drawbacks of Security: – Reduction in Efficiency

Motivation Similarly, performance overhead due to Security features in software is considerable.

Motivation Similarly, performance overhead due to Security features in software is considerable. However, with the proliferation of multicore processors, we can introduce parallelism in software security validations.

Goal Our aim is to optimize and evaluate the performance of SELinux (Security Enhanced Linux) on a multicore processor.

Goal Our aim is to optimize and evaluate the performance of SELinux (Security Enhanced Linux) on a multicore processor. – SELinux is a Linux operating system feature that provides fine grain access control over system resources.

Goal Our aim is to optimize and evaluate the performance of SELinux (Security Enhanced Linux) on a multicore processor. – SELinux is a Linux operating system feature that provides fine grain access control over system resources. – We propose several techniques to introduce parallelism in the SELinux architecture.

Goal Our aim is to optimize and evaluate the performance of SELinux (Security Enhanced Linux) on a multicore processor. – SELinux is a Linux operating system feature that provides fine grain access control over system resources. – We propose several techniques to introduce parallelism in the SELinux architecture. – We evaluate our approach using a Cell Broadband Engine (CBE) multicore processor.

Background - SELinux SELinux implements the Mandatory Access Control (MAC) security paradigm. MAC operates on a set of rules to constrain a ‘process’ from performing an operation on a resource (e.g. file). Each process/resource is assigned a label called security context, which eases the task of writing security policy rules.

Background – SELinux Architecture Subject: Process xyz Policy database Security Server: Makes Decision Allowed ? LSM Hooks Policy Enforcement AVC Object: File: xyz.txt Linux DAC Deny No Access: Read Yes SELinux MAC Access Vector Cache Security Context

Identifying SELinux Performance Bottlenecks The decision to allow or deny an operation is a two step process, – Validation of the security contexts (SC) of the source (Process) and target (resource). – Determining the presence of a security policy rule corresponding to the requested operation. We found the validation step to be a major cause for performance overhead.

Hardware Setup - CBE The CBE is a master-slave based multicore processor consisting of one Power Processing Element (PPE) and eight Synergistic Processing Elements (SPE). Execution on SPE is initiated by PPE and data is transferred using DMA controllers. We used a Sony Play Station 3 console powered by a CBE processor, with Yellow Dog Linux 6.1 installed.

Our Approach We implement a parallel search using SIMD programming paradigm in the validation of security contexts (SC).

Our Approach We implement a parallel search using SIMD programming paradigm in the validation of security contexts (SC). Since the SC has three components, the validation requires traversing 3 linked list data structures.

Our Approach We implement a parallel search using SIMD programming paradigm in the validation of security contexts (SC). Since the SC has three components, the validation requires traversing 3 linked list data structures. We use either 3 SPEs (3U) or 6 SPEs (6U) to perform the search with one or two SPEs per component respectively.

Our Approach We implement a parallel search using SIMD programming paradigm in the validation of security contexts (SC). Since the SC has three components, the validation requires traversing 3 linked list data structures. We use either 3 SPEs (3U) or 6 SPEs (6U) to perform the search with one or two SPEs per component respectively. We also evaluate a busy wait strategy on the SPE, where the SPE is not freed between node lookups.

Our Approach – Different Number of SPEs

SPE Busy Wait Loading Strategy Keep the SPE waiting till the data for next node in the linked list is available.

SPE Busy Wait Loading Strategy Keep the SPE waiting till the data for next node in the linked list is available. Pros – Improves performance by eliminating load time.

SPE Busy Wait Loading Strategy Keep the SPE waiting till the data for next node in the linked list is available. Pros – Improves performance by eliminating load time. Cons – Other processes which require the SPE may be blocked. – Requires continuous polling on main memory which impede data access operations.

Optimizing DMA Transfers for Matching Strings DMA double buffering for null terminated Strings.

Performance Measurement Evaluation based on two configurable parameters,

Performance Measurement Evaluation based on two configurable parameters, – Number of rules in security policy. This determines the number of valid security contexts We evaluate with policies contining 0 – 4000 rules.

Performance Measurement Evaluation based on two configurable parameters, – Number of rules in security policy. This determines the number of valid security contexts We evaluate with policies contining 0 – 4000 rules. – Size of Access Vector Cache (AVC). Helps accurately measure overhead due to decision making logic in the Security server. Two different AVC size – 512 entries (Optimal) and 1 entry (Minimal).

Results : Single Core PPE Performance The increase in running time is about 64%, 112% between rules with optimal and minimal AVC respectively. Establishes the fact that security context validations are computationally intensive.

Results : Comparing Different Techniques Counter-intuitive results showing multicore performance lower than single core with Optimal AVC size. However, with Minimal AVC size and busy wait strategy, there is an efficiency gain of up to 43%.

Conclusion The gain in efficiency of optimizing security validations depend on the architecture of the software and the hardware platform. However, software applications designed for a uniprocessor system cannot be easily optimized for parallel computing. The problem is especially prominent in securityrelated applications, since the priority is robustness rather than efficiency.

Future Work One extension of our work is to apply the proposed techniques to other security features / applications like TOMOYO Linux, SMACK5, and compare their performances. Evaluating our approach on different multicore architectures like GPGPUs, could give greater insights into its effectiveness. Analyze the proposed techniques in distributed platforms like Beowulf clusters and grid networks.

Questions?