Presentation is loading. Please wait.

Presentation is loading. Please wait.

4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource.

Similar presentations


Presentation on theme: "4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource."— Presentation transcript:

1 4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource capacity and quality, provisioning the resources, workflow management, application driven resource allocation Security Authentication and authorization, integrity of system - ensure users can not interfere with other users, detect anomalous behavior, detection of inappropriate use and response Integration and test Managing and maintain health of system, continuous diagnostics, operational capabilities Logging, reporting, and analyzing information Analyzing data Types of data Static machine, dynamic machine, RAS events, session log External coordination of resources From the system - common communication infrastructure, errors reporting in a standardized way Distributed Computing

2 4.1.5 System Management Technology drivers – Changing relative cost of components (io bandwidth, io latency) – Power considerations – Increasing failure rate – Exploding component count – Vast volumes of data

3 4.1.5 System Management Security validation of diverse components Fine-grained authentication Dynamic provisioning traditional resources Integrated non-traditional resources: bandwidth, power System Complexity (# transistors * Lines of Code) Continual resource failure and dynamic reallocation Resource control and scheduling Security Integration and test Logging, reporting, analyzing information External coordination of resources Proactive failure detection Unified framework for event collection Hardware support full system security Model and filter for event analysis Continual monitoring and test

4 4.1.5 System Management Recommended research agenda Need to better characterize non-traditional - power, I/O bandwidth How to control communication – provision and control, different for HPC than WAN routing Figure out real-time aspect and feedback for resource control Develop techniques for dynamic provision under constant failure of components Coordinated resource discovery and scheduling aligned with Exascale resource management Fine grained authentication and authorization by function/resource Security Verification for SW built from diverse components “Defense in depth” within systems without performance impact – Security focused OS – End-to-end data integrity Tradeoffs of security and openness (e.g. grids) Determine key elements for monitoring Continue mining failure data and determining patterns Continuous monitoring and testing without affecting system behavior Investigate good filters, provide stateful filters for predicting potential incorrect behavior Determine statistical and data models that accurately capture behavior Determine proactive diagnostic and testing

5 4.1.5 System Management Alternative R&D strategies – Explore overlap with telecommunications technology for bandwidth scheduling – Examine data mining techniques – Leverage real-time techniques for time sensitive operations – Apply security monitoring methodologies to system logging and activity analysis – Look at performance filtering and data analysis, many problems are the same

6 4.1.5 System Management Crosscutting considerations – Resiliency – Power Gather data from all subsystems and unify – Consistency of performance – Usability – Common communication and data format across subsystems

7 System Management Priority Research Direction (resource control and scheduling) Key challenges How to control communication addition hardware needed? Need to better characterize non-traditional power, I/O bandwidth Figure out real-time aspect and feedback Scale – ensure time does not grow with size Scale – unobtrusive to users of system Power – monitor power of machine Non-traditional resources must be managed Models must be developed Develop framework for richer workflow Software will be functionally decomposed Software will be distributed to scale, be resilient, and avoid single-point-of-failure New control points will be in the software More responsive resource provisioning More flexibility, control, and productivity for applications Summary of research direction Potential impact on software component Potential impact on usability, capability, and breadth of community

8 System Management Priority Research Direction (open system security) Key challenges Fine grained authentication and authorization by function/resource Security Verification for SW built from diverse components “Defense in depth” within systems Security focused OS End to End data integrity Tradeoffs of security and openness (e.g. grids) Scale and Complexity Assume identity theft Security profiling and monitoring at scale (numbers of OS’s, Numbers of links, Speed of links, etc.) without impact or restriction Use of commodity components increases vulnerability Detecting anomalous behavior by external profiling of “normal” behavior Better design Scalable security policy Finer Grain resource authentication Better monitoring Improved system uptime in the face of increased threat activities Better protection of data Improved access Summary of research direction Potential impact on software component Potential impact on usability, capability, and breadth of community

9 System Management Priority Research Direction (integration and test) Key challenges Determine key elements for monitoring Continue mining failure data and determining patterns Determine proactive diagnostic and testing Tradeoff between more extensive diagnostics and affecting system performance, run while system is up if possible Lots of warnings and errors – how to determine which ones matter Be able to quickly switch between different versions of system Software components need to agree on common format Need format for versioning Better MTBF More consistent performance Summary of research direction Potential impact on software component Potential impact on usability, capability, and breadth of community

10 System Management Priority Research Direction (logging, analyzing, and reporting) Key challenges Investigate good filters Determine statistical models that accurately capture behavior Huge amount of data already and increasing Determine common infrastructure across disparate components Provide framework for component connection Ability to visualize condensed data Need software infrastructure to capture data Better MTBF Ability to better understand system Ability to make better decisions about machine use Ability to design more reliable future machines by understanding achilles heel Summary of research direction Potential impact on software component Potential impact on usability, capability, and breadth of community

11 System Management Priority Research Direction (External Coordination of Resources) Key challenges Providing information about availability and use of system resources Coordinated aggregation of system resources (e.g. reservations) Better design Scalable security policy Finer Grain resource authentication Better monitoring Improved system uptime in the face of increased threat activities Better protection of data Improved access Summary of research direction Potential impact on software component Potential impact on usability, capability, and breadth of community Coordinated resource discovery and scheduling aligned with Exascale RM Coordinated distributed security


Download ppt "4.1.5 System Management Background What is in System Management Resource control and scheduling Booting, reconfiguration, defining limits for resource."

Similar presentations


Ads by Google