Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,

Similar presentations


Presentation on theme: "Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,"— Presentation transcript:

1 Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler, Ahmed Gheith 3 rd International Workshop on Dependability of Clouds, Data Centers and Virtual Machine Technology (DCDV) Held in conjunction with Dependable Systems and Networks (DSN) Budapest, Hungary June 18, 2013 Saurabh Bagchi, Fahad Arshad

2 Slide 2/18 IBM Research System Management Workflow Something is wrong! PatchPatch PatchPatch

3 Slide 3/18 IBM Research Systems Management: A Changed View Filtering Patch

4 Slide 4/18 IBM Research So What Exactly Are the Changes? 1.Platform being used for doing the systems management Server Mobile devices 1.Large screen 2.Resource rich 3.Within organization’s security perimeter 4.High dependability 1.Large screen 2.Resource rich 3.Within organization’s security perimeter 4.High dependability 1.Small screen 2.Resource constrained 3.Outside organization’s security perimeter 4.Lower dependability 1.Small screen 2.Resource constrained 3.Outside organization’s security perimeter 4.Lower dependability

5 Slide 5/18 IBM Research So Exactly Are the Changes? 2.Layered systems management to flat hierarchy Filtering

6 Slide 6/18 IBM Research Case Study: IBM Research’s IBM Remote Project Always ConnectedInstantaneousFocusedSimple User Interface Communication visualization of complex data relevance first drill-down UI direct connection to the managed machines refresh rate vs. power consumption IBM Blade Centers

7 Slide 7/18 IBM Research Case Study: IBM Remote Project

8 Slide 8/18 IBM Research Research Challenges Due To The Changes 1.Platform being used for doing the systems management: Server to Mobile Devices I.How do we optimize the scarce resources of the systems management platforms? Primarily, battery and communication bandwidth. II.How do we handle the fact that the platforms will be insecure and fault-intolerant for parts of their operation? III.How do we visualize the (hopefully) rare failure event in a deluge of systems monitoring data?

9 Slide 9/18 IBM Research Research Challenges Due To The Changes 2.Layered systems management to flat hierarchy I.Can we avoid chaos due to the looser coordination? II.Can we leverage overlap between interests to cut down on traffic to individual mobile devices?

10 Slide 10/18 IBM Research Solution Directions for Question 1 I.How do we optimize the scarce resources of the systems management platforms? Primarily, battery and communication bandwidth. 1.Platform being used for doing the systems management: Server to Mobile Devices Minimize number of messages, while still receiving enough to reliably detect failures –Use publish-subscribe or other push mechanism, in preference to pull mechanism –BUT: Most hardware management modules do not support push –Use an intermediate server for aggregation and filtering Apply principles of rare event detection –Non-events occur with much higher frequency than events of interest –BUT: Requires model of events: time distribution, correlation, etc.

11 Slide 11/18 IBM Research Solution Directions for Question 1 II.How do we handle mismatch in dependability characteristics (between target platform and management platform)? –Mobile device can be physically compromised and OS-level protection can be bypassed –Mobile devices are often employee owned 1.Platform being used for doing the systems management: Server to Mobile Devices Application security and server-side security need to be built in –Periodic authentications, not one-time authentications –Biometric-based authentication

12 Slide 12/18 IBM Research Solution Directions for Question 1 III.How do we visualize the needle in the haystack? –Needle: Outages, failures, or behavior that is indicative of an imminent failure –Haystack: Deluge of monitored data about target platforms –Screen real estate is limited 1.Platform being used for doing the systems management: Server to Mobile Devices First off, deliver only a small superset of relevant messages –Push notification, such as, through Google Cloud Messaging (GCM) Drill-down views, starting with summary alert view for all machines in data center –Followed up with root cause analysis techniques that run on servers

13 Slide 13/18 IBM Research Solution Directions for Question 2 I.Tight vertical integration of different software layers implies different domain experts will be concurrently involved in problem troubleshooting 1.Layered systems management to flat hierarchy, OR Crowdsourcing systems management Relevant features of social media will be used –Example: At IBM, you can “friend” specific Blade Centers and have “circles” of administrators Role-based Access Control (RBAC) can be used for security control of different software layers –Fine-grained roles can be assigned –RBAC solutions exist for sophisticated management of these roles, such as, hierarchies, overlaps, and trasience

14 Slide 14/18 IBM Research Solution Directions for Question 2 I.Overlap between interests of multiple mobile devices and their geographical proximity 1.Layered systems management to flat hierarchy, OR Crowdsourcing systems management Commonalities of interest can be used to cut down on cellular bandwidth usage –Commonalities can exist due to proximal geographic location or overlap among system administration responsibilities –Distribute information to a subset of mobile devices and then use local communication (Bluetooth, Wi-Fi) to disseminate information among proximal devices

15 Slide 15/18 IBM Research Case Study: IBM Remote Health view (left) broken into critical, non-critical, and system- level health messages Event log view (right) is filtered to show only warnings and errors

16 Slide 16/18 IBM Research Related Work Much work on managing mobile devices – opposite direction than what we are discussing in this paper –Some work on mobile agents for managing servers [18 – NOMS02, 19 – Software07] –Sophistication lies in designing a dynamic set of agents whose monitoring policies can be changed on the fly Some commercial prototypes for monitoring and control of target end points from mobile devices –UCSand for Android devices [21] for Cisco Unified Systems monitoring and control –PCMonitor [22] from MMSoft Design Ltd. –VMWare vCenter Mobile Access [23] is a virtual appliance on the server side for managing a datacenter from mobile devices –Recent offering from HP [18]

17 Slide 17/18 IBM Research Take-away Lessons A changed vision of systems management is happening – mobile clients being used to manage large masses of physical and virtual servers This throws open some technical challenges 1.Management to be done through resource-constrained mobile devices which have lower dependability than target devices 2.Crowd-sourcing of systems management, rather than linear flow of control through hierarchies of sysadmins These challenges are being addressed in multiple projects at commercial organizations, including in the IBM Remote project at IBM Research

18 Slide 18/18 IBM Research Presentation available at: Dependable Computing Systems Lab (DCSL) web site engineering.purdue.edu/dcsl


Download ppt "Slide 1/18 IBM Research Lilliput meets Brobdingnagian: Data Center Systems Management through Mobile Devices Jan Rellermeyer, Thomas Osiecki, Michael Kistler,"

Similar presentations


Ads by Google