Presentation is loading. Please wait.

Presentation is loading. Please wait.

WHEA System Design And Implementation

Similar presentations


Presentation on theme: "WHEA System Design And Implementation"— Presentation transcript:

1 WHEA System Design And Implementation
4/20/2017 6:42 AM WHEA System Design And Implementation John Strange Software Design Engineer Microsoft Corporation © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

2 4/20/2017 6:42 AM Key Takeaways Understand what it takes to implement a WHEA-enabled platform Improve server reliability by implementing required WHEA features Differentiate server products by extending WHEA capabilities © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

3 Agenda WHEA Overview Description of a WHEA-enabled platform
4/20/2017 6:42 AM Agenda WHEA Overview Description of a WHEA-enabled platform Key requirements of WHEA- enabled platform Dell WHEA system implementation Dell demo of WHEA-enabled platform © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

4 4/20/2017 6:42 AM WHEA Objective To make Windows-based platforms more resilient in the face of hardware errors Better root cause analysis Better support for hardware error recovery Error avoidance with health monitoring © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

5 WHEA Overview Operating System Support (OS)
4/20/2017 6:42 AM WHEA Overview Operating System Support (OS) Windows is much more agile with respect to hardware error sources Platform describes error sources to Windows Standardized hardware error record format UEFI 2.1 Common Platform Error Record Hardware error recovery support Hardware error events Event Tracing For Windows (ETW) © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

6 WHEA Overview Platform integration
4/20/2017 6:42 AM WHEA Overview Platform integration Platforms retain their existing investment in error handling features Controls the level of integration with the OS Leverages existing error handling and reporting features © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

7 WHEA Overview Platform implementation
4/20/2017 6:42 AM WHEA Overview Platform implementation Satisfying Windows Server 2008 logo requirements Error record persistence Error injection WHEA _OSC method BOOT Error Source (X86/X64 platforms only) Extending WHEA feature set to add value Add richer error data content (i.e. FRU info) Participate in error recovery © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

8 WHEA Components WHEA-Enabled Applications User Kernel
WHEA-Enabled Management Applications Provided by: Microsoft User ISV Kernel OS Hardware Error Handlers IHV OEM Platform Specific Hardware Error Driver PSHED Plug-ins Platform Hardware Error Handlers WHEA ACPI Tables WHEA ACPI Tables

9 WHEA-Enabled Platform
Feature/ Processor Architecture x86 x64 Itanium Error Source Enumeration Optional: HEST or PSHED plug-in Error Record Persistence Required: ERST or PSHED plug-in Required: ERST, PSHED plug-in, or UEFI 2.1 variables services BOOT Error Source Required Optional Error Injection Required: EINJ or PSHED plug-in Optional if PAL-based or MSR-based error injection is supported Error Information Retrieval Optional: PSHED plug-in Error Source Control Error Recovery _OSC

10 Reporting Error Sources
4/20/2017 6:42 AM Reporting Error Sources The platform must report error sources to the Windows only for the following To override default error source configuration To report error sources Windows does not support by default It needs firmware-first control of one or more error sources It uses generic error source to inject errors © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

11 Error Source Defaults x86/x64 Machine check
4/20/2017 6:42 AM Error Source Defaults x86/x64 Machine check x86/x64 Machine Check Settings IA32_MCG_CTL: 0xFFFFFFFFFFFFFFFF IA32_MCi_CTL: 0xFFFFFFFFFFFFFFFF OS respects settings in IA32_MC0_CTL x86/x64 Corrected Machine Checks Polling interval is 60 seconds © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

12 Error Source Defaults PCI Express AER
4/20/2017 6:42 AM Error Source Defaults PCI Express AER Endpoint Devices Default Value Device Control 0x0007 Uncorrectable Error Mask 0x Uncorrectable Error Severity 0x Correctable Error Mask 0x Capabilities and Control 0x Root Ports Default Value Root Error Command 0x0007 Bridges Default Value Secondary Uncorrectable Error Mask 0x000017A8 Secondary Uncorrectable Error Severity 0x Secondary Capabilities and Control 0x © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

13 Error Record Persistence
4/20/2017 6:42 AM Error Record Persistence In Windows Server 2008 Windows writes error record only when system is to be bugchecked Windows only requires space for one error record Platform must implement persistence interface to get logo Storage requirements x64/x86 platforms require minimum of 1K Itanium platforms require minimum of 128K © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

14 Error Record Persistence Platform implementation
4/20/2017 6:42 AM Error Record Persistence Platform implementation ACPI ERST Table UEFI 2.1 Variable Services Error Record Extensions for EFI-based platforms PSHED plug-in This solution is generally discouraged © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

15 4/20/2017 6:42 AM Error Injection Error injection interface allows hardware errors to be injected on a platform for the following purposes Validation of OS/platform error handling flows Validation of platform logo support for WHEA Exercising hardware/firmware error flows for diagnostic purposes © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

16 Error Injection Platform implementation
4/20/2017 6:42 AM Error Injection Platform implementation Prefer true hardware error injection if possible Enables system/component diagnostic In cases where no true hardware injection is possible, generic error source can be used to simulate errors Enables feature validation © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

17 WHEA _OSC Method New \_SB _OSC method
4/20/2017 6:42 AM WHEA _OSC Method New \_SB _OSC method GUID {ed855e0c-6c90-47bf-a62a- 26de0fc5ad5c} Notifies platform that Windows implements WHEA so platform can perform any necessary configuration If platform does not implement \_SB _OSC or if the platform returns “Unrecognized UUID”, Windows does not configure WHEA support for the platform © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

18 Boot Error Source For fatal errors that cannot be processed by the OS
4/20/2017 6:42 AM Boot Error Source For fatal errors that cannot be processed by the OS Firmware-initiated reset BMC-initiated reset Sync-flood reset Platform describes the error to Windows using the BOOT error source ACPI BERT tables describes the platform’s BOOT error source to Windows © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

19 Firmware First Platform can indicate that error sources should be handled first by firmware Via error source enumeration interface Some error sources cannot do firmware- first (i.e. machine check exception) Generally, an error source reported as firmware-first is configured by the platform to generate an SMI

20 Firmware First Enumerating error sources
The error source for which platform wants firmware-first control is marked as FIRMWARE_FIRST A paired generic error source must be enumerated This error source is how the platform will signal errors from the firmware-first source to the OS

21 Firmware First Error handling flow
Platform gains control when error occurs (SMI) Platform processes and possibly logs the error Platform may void errors in some cases Platform fills in error status block with information describing the error Platform is responsible for clearing HW error status

22 Firmware First Error handling flow
Platform signals the error to Windows using the notification mechanism it reported when it enumerated the error source This means platform generates an NMI, interrupt, or allows Windows to poll, etc Signaling mechanism depends on type of error (i.e. corrected/uncorrected) Windows clears bits in block status to signal that it has processed the error

23 Dell WHEA System Design
4/20/2017 6:42 AM Dell WHEA System Design Mukund Khatri Server Strategist Dell Inc. © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. 23

24 4/20/2017 6:42 AM Overview Close collaboration between Dell and Microsoft on WHEA feature design over last couple of years Design enhancements Prototype efforts WHEA architecture holds great promise for future server designs Enables OS participation in error handling flows Flexibility to retain full value in existing error handling infrastructure © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. 24

25 WHEA: Dell Implementation
4/20/2017 6:42 AM WHEA: Dell Implementation Dell Implementation incorporates support for Error Enumeration and Control Error record persistence WHEA _OSC method BOOT Error Source Error injection Implementation uses Firmware First Mode Complementary to OS-first mode © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. 25

26 Firmware First Mode Implementation considerations
4/20/2017 6:42 AM Firmware First Mode Implementation considerations Ability at platform firmware level to override defaults in OS without PSHED plug-ins Silicon errata management Updates to interface specifications Control over level of integration with OS Extend WHEA feature set to add value Add richer error data content (ex: FRU info) Retain existing investments in error handling infrastructures © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. 26

27 Platform Hardware Event Flow Firmware First Mode
New for ecosystem consumption Errors handled by OS Platform Errors ETW Data Errors handled by Platform Firmware Service Processor and Management Consoles Error Flow with: No WHEA Existing error management paradigm still retained Richer error records and ETW available for consumption New with WHEA

28 Dell PowerEdge™ System
4/20/2017 6:42 AM demo Dell PowerEdge™ System Mukund Khatri Server Strategist Dell Inc. © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

29 Demo: Dell PowerEdge™ Server
4/20/2017 6:42 AM Demo: Dell PowerEdge™ Server Injection of PCI-Express uncorrectable error Error captured and processed by platform firmware Firmware creates and uploads GES data packet and triggers NMI to OS WHEA error record stored in persistent storage System bug-checks and subsequently reboots OS retrieves WHEA error record on next boot Event viewer reports the event along with error record © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. 29

30 PCI-Express Error Record Output

31 4/20/2017 6:42 AM Dell: Key Takeaways New Dell servers will include full support for WHEA We intend to build on WHEA architecture to add end customer value in future Dell servers Dell and Microsoft partnering on WHEA architecture and implementation © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION. 31

32 Call To Action WHEA-enable your server platforms now
Work with Microsoft to get BIOS reference implementations Validate WHEA support Run Logo Tests to validate WHEA implementation Fully implement and validate Advanced Error Reporting capability in PCI-express devices

33 Additional Resources Related Sessions WHEA Feedback: Specifications
SVR-T464 WHEA Platform Implementation SVR-C460 WHEA PSHED Plug-in SVR-T325 Dynamic Partition: Windows Server WHEA Feedback: WHEA introduction: Specifications WHEA Platform Design Guide UEFI 2.1 Specification microsoft.com

34 © 2007 Microsoft Corporation. All rights reserved
© 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.


Download ppt "WHEA System Design And Implementation"

Similar presentations


Ads by Google