Presentation is loading. Please wait.

Presentation is loading. Please wait.

Microsoft Project Olympus Rack Management

Similar presentations


Presentation on theme: "Microsoft Project Olympus Rack Management"— Presentation transcript:

1

2 Microsoft Project Olympus Rack Management
Mallik Bulusu Cloud Firmware Architect Microsoft Corporation

3 What are these images? Super view of London – courtesy NASA
TITLE SLIDE What are these images? Super view of London – courtesy NASA Super view of Phoenix, AZ – courtesy NASA

4 Building Block for Data Center Management
TITLE SLIDE Building Block for Data Center Management Cloud tenants – Set of VMs and OS instances Set of Servers Rack is the fundamental building block for effective Data Center Management. Rack manager is the brain. Rack Manager Hardware Rack Manager Firmware / Software Set of Racks

5 Rack Manager Requirements
TITLE SLIDE Rack Manager Requirements Rack Boundary Rack Manager Functions Power Management Out-of-band Server Management RM Instances in Cloud Discrete (Consumes 1U) Integrated into PMDU Communication Network TTY Console Hardware Signaling Server Presence Server On/Off Server Throttle Power Metering & Control Remote Debug Remote Media Out of band FW update and recovery UEFI, CPLD, FPGA, PSU Rack Manager Fabric Switch Server Server Server Server Server Server Server Server

6 Rack Manager Design Choices
TITLE SLIDE Rack Manager Design Choices ASPEED vs ARM Dual NIC Different Peripherals Higher accuracy ADC Offload Power Calculations Miscellaneous provisioning of components etc. Network Port: Shared vs. Discrete NCSI: single port with two MAC address Rack Manager: Dedicated

7 Rack Manager Data Traffic Interfaces
REDFISH REDFISH Rack bound Fabric bound SSH SSH Legacy REST IPMI

8 Rack Manager Block Diagram
TITLE SLIDE Rack Manager Block Diagram LEDs – Attention, Power, Debug, Status PCIe x16 edge finger & PCIe x8 edge finder to Interface with PMDU or backplane Temp Sensor Humidity Sensor To Fabric To Mgmt. Switch FRU GPIO Buffers x48 for Blade presence x48 for Blade enable To Mgmt. Switch eMMC GPIOs for Boot strap, Throttle bypass, power control etc. Debug QSPI To DIGI 1GB DDR3L

9 Management Switch Requirements
Rack Boundary General purpose switch (L2+) 48x RJ45 ports Management Requirements Console Port (OOB) Ethernet 1+1 AC Redundant Power Supply PSUs must support Hot on-line repair Fans are not required Orientation Cold Isle  RJ45 ports, Uplinks, UART Hot Isle  PSUs (for replaceability) Configuration Requirements 48 individual VLANs, 48 DHCP address pools Single leasable address per pool Support for indefinite DHCP leases Configuration update via UART RJ45 port Supports TFTP firmware recovery Rack Manager Fabric Switch Server Server Server Server Server Server Server Server

10 Rack Manager Build Process
Dependencies Recipes Repo Kernel Recipes UBOOT YOCTO QEMU BSP Sources Sources BIN Applications + Services BITBAKE toolchain

11 Rack Manager Firmware Stack
Embedded Firmware Small foot print (min-kernel) Yocto build framework Connectivity Ethernet: SSH for CLI Redfish HTTPS UART: CLI Console Controls Hardware OFF/ON, AC Relay, Server OOB BMC Interface. Robust FW Update Recovery Boot Loader Recovery Console Factory Restore or rollback Ethernet + TFTP Bare-metal imaging Ethernet – bootp + TFTP

12 Rack Manager Firmware Recovery
Normal Update TFTP/SCP Secure File Copy Forklift - Execute in place uboot Recovery Read-only partition WDT Recovery TFTP + Ethernet Uboot recovery console Remote Pin Strap Ethernet – bootp + TFTP

13 Rack Manager Firmware Logical Flow
Auth at Interface: Ethernet / REST CLI / TTY Privilege: Permissions token Log / Audit Execution Task execution Logical to Physical Action Hardware Control

14 Firmware Commands Rack Manager Rack Manager Network Rack Power Users
Rack Power Status Rack Power Reading Log Power Faults Rack Manager Power Limit Control Rack Power Limit Row Alert Rack Power Reading, per phase Rack Power Meter version Clear Rack Power Fault Log Clear Rack Max Power Rack Power Telemetry Rack Power Throttle Control Rack Power Throttle Report Rack Throttle HW Bypass Users shows roles and users shows users by role shows users by user group add new users update user role update user password delete existing user Rack Manager show rack manager version show rack information show inventory of rack rack manager health PDU power port status PDU On/Off Rack Attention LED Rack Manager Status LED Rack Audit Log Rack Telemetry Log Log management, clearing/moving Rack Manager asset info Rack Manager FRU programming Rack Manager Relay Status Rack Manager Relay Control Firmware Update Firmware Recovery Rack Manager Console Rack Manager Network Configure network settings Configure network interface add static routes enable/disable interfaces Rack Manager Serial Serial Port Console Control Management Switch Switch Firmware Update Switch Config Update Switch Reset Switch Console Switch Status Switch Port Status Rack Services TFTP Server Control TFTP Client NFS Server Control NTP server and Control Rack Manager Configuration Rack Manger session list Rack Manager session kill Native IPMI commands

15 Firmware Commands – Contd.
Server Management Server information Server health Sensors and health Server Fan health Server Rack location Server LED status/control Server State Server Power Control Server Soft Power Control Default Power State System Presence POST code logging System Event Logs Power Cycle Control FPGA health status FPGA temperature FPGA version Server Management FPGA pass-through mode FPGA asset info FPGA Update FPGA Recovery Remote Media booting Remote HW Debug Remote Kernel Debug Boot order control Next boot control BIOS Configuration BIOS Update BIOS Physical Presence BMC Update BMC Command Console BMC Cmd pass-through Server Console Session Console Session Control Server Power Server Power Limit Control Server Power Alert Actions Server Power Policy Actions Server Power Limiting Server Throttle Status PSU Status Battery presence Battery test PSU Firmware Status PSU Firmware Update Firmware and boot loader version Clear PSU Fault log Clear Phase Fault Log PSU Status, Battery Status

16 Additional Firmware Commands
Server Management Server Presence/Location Server Extended Logs Server Temperature / Fans Server IPT (JTAG debug) Server FPGA (debug /recovery) Server Kernel Debug Server Remote Media Server Power Policies Server Firmware Update Server Firmware Recovery Server PSU health Server Cmd pass-through Rack Power Rack Temperature Rack Inventory Firmware Update Rack Telemetry Log SSH Shell TFTF Server / Client Management Switch Switch Firmware Update Switch Config Update Switch Console Switch Power Control Switch Status

17 Rack Manager Interfaces
SSH CLS Interface SSH Serial Console Complete Walled Garden CLI Environment REST Interface Functionally comparable to CMv1/CMv2 Redfish support Example Schema Hierarchy Rack vs. Server Schema Comparison Example Schema Extensions

18 Links Project Olympus Software Implementation
@ OCP: @ Github : Software Implementation @ Github : Texas Instruments AM4376 Datasheet: Redfish

19 Back-up

20 AC Power Monitoring Monitors AC Power at the Rack Level
Senses AC voltage for each phase Sense AC current for each phase (Current Transformer) 12 A/D Converters on the Rack Manager Scale and calculate rack power Accuracy +/-2.5% 20% max load to 100% Error increases .5% below 20% Requires factory calibration

21


Download ppt "Microsoft Project Olympus Rack Management"

Similar presentations


Ads by Google