Dell 3250 Руководство по эксплуатации - Страница 14

Просмотреть онлайн или скачать pdf Руководство по эксплуатации для Сервер Dell 3250. Dell 3250 37 страниц. Product guide (.pdf)
Также для Dell 3250: Обновление информации (32 страниц), Руководство пользователя (39 страниц), Руководство пользователя (39 страниц), Краткое руководство по эксплуатации (2 страниц)

SR870BH2 Machine Check Error Handling

5.3

Error Signaling

There are two classes of error events:

Machine Check Error Events: A processor machine check occurs when the processor

detects a fatal or recoverable error during execution of instructions or when the

processor is signaled by the platform to enter machine check.

Machine Check Architecture (MCA): The MCA can be either local or global. In the

event of an MCA, the processor will take the exception at instruction boundary with

highest priority. In the event of a local abort, the affected processor will enter MCA

handling mode. If the event is global, all processors will enter MCA handling mode.

Uncorrectable Error Events:

•

Local MCA: A local MCA is taken by the processor when it reads data with

uncorrectable errors, or receives a hard fail response to a transaction. There are

two types of machine check events: local and global. A local MCA is when an

individual processor enters machine check. Some examples of local machine

checks include a Distributed Translation Lookaside Buffer (DTLB) data parity

error, or when the processor consumes data with an uncorrectable error.

•

Global MCA: A machine check is global when all processors enter machine

check. A machine check is global when all processors enter machine check. On

the SR870BH2 platform, the method used to get all processors into machine

check are the BINIT# and BERR# signals. The processor asserts BINIT#, or

there is an assertion of BERR# by the processor or platform. The processor can

assert BINIT# on a transaction time-out event. BERR# is asserted by the

platform on platform-fatal errors, and can be programmed to assert BERR# when

an uncorrectable error is detected on I/O read data.

Correctable Error Events:

•

Corrected Machine Check (CMC): Corrected Machine Check Interrupt (CMCI):

Corrected processor errors are signaled as a CMCI to system software. For

example, L1 tag parity errors, on shared lines or thermal events, are corrected by

the processor (logic or the PAL). System software must insure that the interrupt

handler for CMCI executes on the same processor that signaled the corrected

error event.

•

Corrected Platform Errors (CPE): These interrupts are signaled by the platform or

the SAL. These include errors that are corrected by the platform (such as single-

bit ECC error in memory) and errors that are not correctable by the platform. In

either case, the error is contained (i.e., data poisoning), and the platform can still

function reliably. One example of an uncorrected error is a 2XECC error detected

on a write to memory.

Intel® Server Platform SR870BH2

Revision 1.1