h4. Chapter 13
h1. Hardening Solaris Drivers
Fault Management Architecture (FMA) I/O Fault Services enable driver developers to integrate fault management capabilities into I/O device drivers. The Solaris I/O fault services framework defines a set of interfaces that enable all drivers to coordinate and perform basic error handling tasks and activities. The Solaris FMA as a whole provides for error handling and fault diagnosis, in addition to response and recovery. FMA is a component of Sun's Predictive Self-Healing strategy.
A driver is considered hardened when it uses the defensive programming practices described in this document in addition to the I/O fault services framework for error handling and diagnosis. The driver hardening test harness tests that the I/O fault services and defensive programming requirements have been correctly fulfilled.
This document contains the following sections:
* [Sun Fault Management Architecture I/O Fault Services|#fmaiofs] provides a reference for driver developers who want to integrate fault management capabilities into I/O device drivers.
* [Defensive Programming Techniques for Solaris Device Drivers|Hardening-DefProg] provides general information about how to defensively write a Solaris device driver.
* [Driver Hardening Test Harness|Hardening-BOFI] is a driver development tool that injects simulated hardware faults when the driver under development accesses its hardware.
{anchor:fmaiofs}
h2. Sun Fault Management Architecture I/O Fault Services
This section explains how to integrate fault management error reporting, error handling, and diagnosis for I/O device drivers. This section provides an in-depth examination of the I/O fault services framework and how to utilize the I/O fault service APIs within a device driver.
This section discusses the following topics:
* [What Is Predictive Self-Healing?|#gemgv] provides background and an overview
of the Sun Fault Management Architecture.
* [Solaris Fault Manager|#gemgw] describes additional background with a focus on a high-level overview of the Solaris Fault Manager, {{fmd}}(1M).
* [Error Handling|#gemgl] is the primary section for driver developers. This section highlights the best practice coding techniques for high-availability and the use of I/O fault services in driver code to interact with the FMA.
* [Diagnosing Faults|#gemfs] describes how faults are diagnosed from the errors
detected by drivers.
* [Event Registry|#gemhe] provides information on Sun's Event Registry.
{anchor:gemgv}
h3. What Is Predictive Self-Healing?
Traditionally, systems have exported hardware and software error information directly to human administrators and to management software in the form of syslog messages. Often, error detection, diagnosis, reporting, and handling was embedded in the code of each driver.
A system like the Solaris OS predictive self-healing system is first and foremost self-diagnosing. Self-diagnosing means the system provides technology to automatically diagnose problems from observed symptoms, and the results of the diagnosis can then be used to trigger automated response and recovery. A *fault* in hardware or a defect in software can be associated with a set of possible observed symptoms called *errors*. The data generated by the system as the result of observing an error is called an error report or *ereport*.
In a system capable of self-healing, ereports are captured by the system and are encoded as a set of name-value pairs described by an extensible event protocol to form an *ereport event*. Ereport events and other data are gathered to facilitate self-healing, and are dispatched to software components called diagnosis engines designed to diagnose the underlying problems corresponding to the error symptoms observed by the system. A *diagnosis engine* runs in the background and silently consumes error telemetry until it can produce a diagnosis or predict a fault.
After processing sufficient telemetry to reach a conclusion, a diagnosis engine
produces another event called a *fault event*. The fault event is then broadcast to all agents that are interested in the specific fault event. An *agent* is a software component that initiates recovery and responds to specific fault events. A software component known as the Solaris Fault Manager, [{{fmd}}(1M)|http://docs.sun.com/doc/819-2240/fmd-1m?a=view], manages the multiplexing of events between ereport generators, diagnosis engines, and agent software.
{anchor:gemgw}
h3. Solaris Fault Manager
The Solaris Fault Manager, [{{fmd}}(1M)|http://docs.sun.com/doc/819-2240/fmd-1m?a=view], is responsible for dispatching in-bound error telemetry events to the appropriate diagnosis engines. The diagnosis engine is responsible for identifying the underlying hardware faults or software defects that are producing the error symptoms.
The {{fmd}}(1M) daemon is the Solaris OS implementation of a fault manager. It starts at boot time and loads all of the diagnosis engines and agents available on the system. The Solaris Fault Manager also provides interfaces for system administrators and service personnel to observe fault management activity.
{anchor:gemft}
h4. Diagnosis, Suspect Lists, and Fault Events
Once a diagnosis has been made, the diagnosis is output in the form of a *list.suspect* event. A list.suspect event is an event comprised of one or more possible fault or defect events. Sometimes the diagnosis cannot narrow the cause of errors to a single fault or defect. For example, the underlying problem might be a broken wire connecting controllers to the main system bus. The problem might be with a component on the bus or with the bus itself. In this specific case, the list.suspect event will contain multiple fault events: one for each controller attached to the bus, and one for the bus itself.
In addition to describing the fault that was diagnosed, a fault event also contains four payload members for which the diagnosis is applicable.
* The *resource* is the component that was diagnosed as faulty. The [{{fmdump}}(1M)|http://docs.sun.com/doc/819-2240/fmdump-1m?a=view] command shows this payload member as “Problem in.”
* The *Automated System Recovery Unit* (ASRU) is the hardware or software component that must be disabled to prevent further error symptoms from occurring. The {{fmdump}}(1M) command shows this payload member as “Affects.”
* The *Field Replaceable Unit* (FRU) is the component that must be replaced or repaired to fix the underlying problem.
* The *Label* payload is a string that gives the location of the FRU in the same form as it is printed on the chassis or motherboard, for example next to a DIMM slot or PCI card slot. The {{fmdump}}command shows this payload member as “Location.”
For example, after receiving a certain number of ECC correctable errors in a given amount of time for a particular memory location, the CPU and memory diagnosis engine issues a diagnosis (list.suspect event) for a faulty DIMM.
{code}
# fmdump -v -u 38bd6f1b-a4de-4c21-db4e-ccd26fa8573c
TIME UUID SUNW-MSG-ID
Oct 31 13:40:18.1864 38bd6f1b-a4de-4c21-db4e-ccd26fa8573c AMD-8000-8L
100% fault.cpu.amd.icachetag
Problem in: hc:///motherboard=0/chip=0/cpu=0
Affects: cpu:///cpuid=0
FRU: hc:///motherboard=0/chip=0
Location: SLOT 2
{code}
In this example, [{{fmd}}(1M)|http://docs.sun.com/doc/819-2240/fmd-1m?a=view] has identified a problem in a resource, specifically a CPU ({{hc:///motherboard=0/chip=0/cpu=0}}). To suppress further error symptoms and to prevent an uncorrectable error from occurring, an ASRU, ({{cpu:///cpuid=0}}), is identified for retirement. The component that needs to be replaced is the FRU ({{hc:///motherboard=0/chip=0}}).
{anchor:gemgg}
h4. Response Agents
An *agent* is a software component that takes action in response to a diagnosis or repair. For example, the CPU and memory retire agent is designed to act on list.suspects that contain a fault.cpu event. The {{cpumem-retire}} agent will attempt to off-line a CPU or retire a physical memory page from service. If the agent is successful, an entry in the fault manager's ASRU cache is added for the page or CPU that was successfully retired. The [{{fmadm}}(1M)|http://docs.sun.com/doc/819-2240/fmadm-1m?a=view] utility, as shown in the example below, shows an entry for a memory rank that has been diagnosed as having a fault. ASRUs that the system does not have the ability to off-line, retire, or disable, will also have an entry in the ASRU cache, but they will be seen as degraded. Degraded means the resource associated with the ASRU is faulty, but the ASRU is unable to be removed from service. Currently Solaris agent software cannot act upon I/O ASRUs (device instances). All faulty I/O resource entries in the cache are in the degraded state.
{code}
# fmadm faulty
STATE RESOURCE / UUID
-------- ----------------------------------------------------------------------
degraded mem:///motherboard=0/chip=1/memory-controller=0/dimm=3/rank=0
ccae89df-2217-4f5c-add4-d920f78b4faf
-------- ----------------------------------------------------------------------
{code}
The primary purpose of a *retire agent* is to isolate (safely remove from service) the piece of hardware or software that has been diagnosed as faulty.
Agents can also take other important actions such as the following actions:
* Send alerts via SNMP traps. This can translate a diagnosis into an alert for SNMP that plugs into existing software mechanisms.
* Post a syslog message. Message specific diagnoses (for example, syslog message agent) can take the result of a diagnosis and translate it into a syslog message that administrators can use to take a specific action.
* Other agent actions such as update the FRUID. Response agents can be platform-specific.
{anchor:gemfg}
h4. Message IDs and Dictionary Files
The syslog message agent takes the output of the diagnosis (the list.suspect event) and writes specific messages to the console or {{/var/adm/messages}}. Often console messages can be difficult to understand. FMA remedies this problem by providing a defined fault message structure that is generated every time a list.suspect event is delivered to a syslog message.
The syslog agent generates a message identifier (MSG ID). The event registry generates dictionary files ({{.dict}} files) that map a list.suspect event to a structured message identifier that should be used to identify and view the associated knowledge article. Message files, ({{.po}} files) map the message ID to localized messages for every possible list of suspected faults that the diagnosis engine can generate. The following is an example of a fault message emitted on a test system.
{code}
SUNW-MSG-ID: AMD-8000-7U, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Fri Jul 28 04:26:51 PDT 2006
PLATFORM: Sun Fire V40z, CSN: XG051535088, HOSTNAME: parity
SOURCE: eft, REV: 1.16
EVENT-ID: add96f65-5473-69e6-dbe1-8b3d00d5c47b
DESC: The number of errors associated with this CPU has exceeded
acceptable levels. Refer to http://sun.com/msg/AMD-8000-7U for
more information.
AUTO-RESPONSE: An attempt will be made to remove this CPU from service.
IMPACT: Performance of this system may be affected.
REC-ACTION: Schedule a repair procedure to replace the affected CPU.
Use fmdump -v -u <EVENT_ID> to identify the module.
{code}
{anchor:gemfo}
h4. System Topology
To identify where a fault might have occurred, diagnosis engines need to have the topology for a given software or hardware system represented. The [{{fmd}}(1M)|http://docs.sun.com/doc/819-2240/fmd-1m?a=view] daemon provides diagnosis engines with a handle to a topology snapshot that can be used during diagnosis. Topology information is used to represent the resource, ASRU, and FRU found in each fault event. The topology can also be used to store the platform label, FRUID, and serial number identification.
The resource payload member in the fault event is always represented by the physical path location from the platform chassis outward. For example, a PCI controller function that is bridged from the main system bus to a PCI local bus is represented by its {{hc}} scheme path name:
{code}
hc:///motherboard=0/hostbridge=1/pcibus=0/pcidev=13/pcifn=0
{code}
The ASRU payload member in the fault event is typically represented by the Solaris device tree instance name that is bound to a hardware controller, device, or function. FMA uses the {{dev}} scheme to represent the ASRU in its native format for actions that might be taken by a future implementation of a retire agent specifically designed for I/O devices:
{code}
dev:////pci@1e,600000/ide@d
{code}
The FRU payload representation in the fault event varies depending on the closest replaceable component to the I/O resource that has been diagnosed as faulty. For example, a fault event for a broken embedded PCI controller might name the motherboard of the system as the FRU that needs to be replaced:
{code}
hc:///motherboard=0
{code}
The label payload is a string that gives the location of the FRU in the same form as it is printed on the chassis or motherboard, for example next to a DIMM slot or PCI card slot:
{code}
Label: SLOT 2
{code}
{anchor:gemgl}
h3. Error Handling
This section describes how to use I/O fault services APIs to handle errors within a driver. This section discusses how drivers should indicate and initialize their fault management capabilities, generate error reports, and register the driver's error handler routine.
Excerpts are provided from source code examples that demonstrate the use of the I/O fault services API from the Broadcom 1Gb NIC driver, {{bge}}. Follow these examples as a model for how to integrate fault management capability into your own drivers. Take the following steps to study the complete {{bge}} driver code:
* Go to [ON (OS/Net) Sources|http://src.opensolaris.org/source/].
* Enter {{bge}} in the File Path field.
* Click the Search button.
Drivers that have been instrumented to provide FMA error report telemetry detect errors and determine the impact of those errors on the services provided by the driver. Following the detection of an error, the driver should determine when its services have been impacted and to what degree.
An I/O driver must respond immediately to detected errors. Appropriate responses include:
* Attempt recovery
* Retry an I/O transaction
* Attempt fail-over techniques
* Report the error to the calling application/stack
* If the error cannot be constrained any other way, then panic
Errors detected by the driver are communicated to the fault management daemon as an *ereport*. An ereport is a structured event defined by the FMA event protocol. The event protocol is a specification for a set of common data fields that must be used to describe all possible error and fault events, in addition to the list of suspected faults. Ereports are gathered into a flow of error telemetry and dispatched to the diagnosis engine.
{anchor:gemfi}
h4. Declaring Fault Management Capabilities
A hardened device driver must declare its fault management capabilities to the I/O Fault Management framework. Use the [{{ddi_fm_init}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-init-9f?a=view] function to declare the fault management capabilities of your driver.
{code}
void ddi_fm_init(dev_info_t *_dip_, int *_fmcap_, ddi_iblock_cookie_t *_ibcp_)
{code}
The {{ddi_fm_init()}} function can be called from kernel context in a driver [{{attach}}()|http://docs.sun.com/doc/819-2255/attach-9e?a=view] or [{{detach}}()|http://docs.sun.com/doc/819-2255/detach-9e?a=view] entry point. The {{ddi_fm_init()}} function usually is called from the {{attach()}} entry point. The {{ddi_fm_init()}} function allocates and initializes resources according to _fmcap_. The _fmcap_ parameter must be set to the bitwise-inclusive-OR of the following fault management capabilities:
* {{DDI_FM_EREPORT_CAPABLE}} - Driver is responsible for and capable of generating FMA protocol error events (ereports) upon detection of an error condition.
* {{DDI_FM_ACCCHK_CAPABLE}} - Driver is responsible for and capable of checking for errors upon completion of one or more access I/O transactions.
* {{DDI_FM_DMACHK_CAPABLE}} - Driver is responsible for and capable of checking for errors upon completion of one or more DMA I/O transactions.
* {{DDI_FM_ERRCB_CAPABLE}} - Driver has an error callback function.
A hardened leaf driver generally sets all these capabilities. However, if its parent nexus is not capable of supporting any one of the requested capabilities, the associated bit is cleared and returned as such to the driver. Before returning from {{ddi_fm_init}}(9F), the I/O fault services framework creates a set of fault management capability properties: {{fm-ereport-capable}}, {{fm-accchk-capable}}, {{fm-dmachk-capable}} and {{fm-errcb-capable}}. The currently supported fault management capability level is observable by using the [{{prtconf}}(1M)|http://docs.sun.com/doc/819-2240/prtconf-1m?a=view] command.
To make your driver support administrative selection of fault management capabilities, export and set the fault management capability level properties to the values described above in the [{{driver.conf}}|http://docs.sun.com/doc/819-2251/driver.conf-4?a=view] file. The {{fm-capable}} properties must be set and read prior to calling {{ddi_fm_init()}} with the desired capability list.
The following example from the {{bge}} driver shows the {{bge_fm_init()}} function, which calls the {{ddi_fm_init}}(9F) function. The {{bge_fm_init()}} function is called in the {{bge_attach()}} function.
{code}
static void
bge_fm_init(bge_t *bgep)
{
ddi_iblock_cookie_t iblk;
/* Only register with IO Fault Services if we have some capability */
if (bgep->fm_capabilities) {
bge_reg_accattr.devacc_attr_access = DDI_FLAGERR_ACC;
bge_desc_accattr.devacc_attr_access = DDI_FLAGERR_ACC;
dma_attr.dma_attr_flags = DDI_DMA_FLAGERR;
/*
* Register capabilities with IO Fault Services
*/
ddi_fm_init(bgep->devinfo, &bgep->fm_capabilities, &iblk);
/*
* Initialize pci ereport capabilities if ereport capable
*/
if (DDI_FM_EREPORT_CAP(bgep->fm_capabilities) ||
DDI_FM_ERRCB_CAP(bgep->fm_capabilities))
pci_ereport_setup(bgep->devinfo);
/*
* Register error callback if error callback capable
*/
if (DDI_FM_ERRCB_CAP(bgep->fm_capabilities))
ddi_fm_handler_register(bgep->devinfo,
bge_fm_error_cb, (void*) bgep);
} else {
/*
* These fields have to be cleared of FMA if there are no
* FMA capabilities at runtime.
*/
bge_reg_accattr.devacc_attr_access = DDI_DEFAULT_ACC;
bge_desc_accattr.devacc_attr_access = DDI_DEFAULT_ACC;
dma_attr.dma_attr_flags = 0;
}
}
{code}
{anchor:gemhm}
h4. Cleaning Up Fault Management Resources
The [{{ddi_fm_fini}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-fini-9f?a=view] function cleans up resources allocated to support fault management for _dip_.
{code}
void ddi_fm_fini(dev_info_t *_dip_)
{code}
The {{ddi_fm_fini()}} function can be called from kernel context in a driver [{{attach}}()|http://docs.sun.com/doc/819-2255/attach-9e?a=view] or [{{detach}}()|http://docs.sun.com/doc/819-2255/detach-9e?a=view] entry point.
The following example from the {{bge}} driver shows the {{bge_fm_fini()}} function, which calls the {{ddi_fm_fini}}(9F) function. The {{bge_fm_fini()}} function is called in the {{bge_unattach()}} function, which is called in both the {{bge_attach()}} and {{bge_detach()}} functions.
{code}
static void
bge_fm_fini(bge_t *bgep)
{
/* Only unregister FMA capabilities if we registered some */
if (bgep->fm_capabilities) {
/*
* Release any resources allocated by pci_ereport_setup()
*/
if (DDI_FM_EREPORT_CAP(bgep->fm_capabilities) ||
DDI_FM_ERRCB_CAP(bgep->fm_capabilities))
pci_ereport_teardown(bgep->devinfo);
/*
* Un-register error callback if error callback capable
*/
if (DDI_FM_ERRCB_CAP(bgep->fm_capabilities))
ddi_fm_handler_unregister(bgep->devinfo);
/*
* Unregister from IO Fault Services
*/
ddi_fm_fini(bgep->devinfo);
}
}
{code}
{anchor:gemgx}
h4. Getting the Fault Management Capability Bit Mask
The [{{ddi_fm_capable}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-capable-9f?a=view] function returns the capability bit mask currently set for _dip_.
{code}
void ddi_fm_capable(dev_info_t *_dip_)
{code}{anchor:gemfl}
h4. Reporting Errors
This section provides information about the following topics:
* [Queueing an Error Event|#gemfu] discusses how to queue error events.
* [Detecting and Reporting PCI-Related Errors|#gemfk] describes how to report PCI-related errors.
* [Reporting Standard I/O Controller Errors|#gemha] describes how to report standard I/O controller errors.
* [Service Impact Function|#gemgp] discusses how to report whether an error has
impacted the services provided by a device.
{anchor:gemfu}
h5. Queueing an Error Event
The [{{ddi_fm_ereport_post}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-ereport-post-9f?a=view] function causes an ereport event to be queued for delivery to the fault manager daemon, [{{fmd}}(1M)|http://docs.sun.com/doc/819-2240/fmd-1m?a=view].
{code}
void ddi_fm_ereport_post(dev_info_t *_dip_,
const char *_error_class_,
uint64_t _ena_,
int _sflag_, ...)
{code}
The _sflag_ parameter indicates whether the caller is willing to wait for system memory and event channel resources to become available.
The ENA indicates the *Error Numeric Association* (ENA) for this error report. The ENA might have been initialized and obtained from another error detecting software module such as a bus nexus driver. If the ENA is set to 0, it will be initialized by {{ddi_fm_ereport_post()}}.
The name-value pair (_nvpair_) variable argument list contains one or more name, type, value pointer _nvpair_ tuples for non-array {{data_type_t}} types or one
or more name, type, number of element, value pointer tuples for {{data_type_t}}
array types. The _nvpair_ tuples make up the ereport event payload required for
diagnosis. The end of the argument list is specified by {{NULL}}.
The ereport class names and payloads described in [Reporting Standard I/O Controller Errors|#gemha] for I/O controllers are used as appropriate for _error_class_. Other ereport class names and payloads can be defined, but they must be registered in the Sun *event registry* and accompanied by driver specific diagnosis engine software, or the Eversholt fault tree (eft) rules. For more information about the Sun event registry and about Eversholt fault tree rules, see the [Fault Management community|http://www.opensolaris.org/os/community/fm/] on the [OpenSolaris project|http://www.opensolaris.org/os/].
{code}
void
bge_fm_ereport(bge_t *bgep, char *detail)
{
uint64_t ena;
char buf[FM_MAX_CLASS];
(void) snprintf(buf, FM_MAX_CLASS, "%s.%s", DDI_FM_DEVICE, detail);
ena = fm_ena_generate(0, FM_ENA_FMT1);
if (DDI_FM_EREPORT_CAP(bgep->fm_capabilities)) {
ddi_fm_ereport_post(bgep->devinfo, buf, ena, DDI_NOSLEEP,
FM_VERSION, DATA_TYPE_UINT8, FM_EREPORT_VERS0, NULL);
}
}
{code}
{anchor:gemfk}
h5. Detecting and Reporting PCI-Related Errors
PCI-related errors, including PCI, PCI-X, and PCI-E, are automatically detected
and reported when you use [{{pci_ereport_post}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/pci-ereport-post-9f?a=view].
{code}
void pci_ereport_post(dev_info_t *_dip_, ddi_fm_error_t *_derr_, uint16_t *_xx_status_)
{code}
Drivers do not need to generate driver-specific ereports for errors that occur in the PCI Local Bus configuration status registers. The {{pci_ereport_post()}} function can report data parity errors, master aborts, target aborts, signaled system errors, and much more.
If {{pci_ereport_post()}} is to be used by a driver, then [{{pci_ereport_setup}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/pci-ereport-setup-9f?a=view] must have been previously called during the driver's {{attach}}(9E) routine, and [{{pci_ereport_teardown}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/pci-ereport-teardown-9f?a=view] must subsequently be called during the driver's {{detach}}(9E) routine.
The {{bge}} code samples below show the {{bge}} driver invoking the {{pci_ereport_post()}} function from the driver's error handler. See also [Registering an Error Handler|#gemie].
{code}
/*
* The I/O fault service error handling callback function
*/
/*ARGSUSED*/
static int
bge_fm_error_cb(dev_info_t *dip, ddi_fm_error_t *err, const void *impl_data)
{
/*
* as the driver can always deal with an error
* in any dma or access handle, we can just return
* the fme_status value.
*/
pci_ereport_post(dip, err, NULL);
return (err->fme_status);
}
{code}
{anchor:gemha}
h5. Reporting Standard I/O Controller Errors
A standard set of device ereports is defined for commonly seen errors for I/O controllers. These ereports should be generated whenever one of the error symptoms described in this section is detected.
The ereports described in this section are dispatched for diagnosis to the eft diagnosis engine, which uses a common set of standard rules to diagnose them. Any other errors detected by device drivers must be defined as ereport events in the Sun event registry and must be accompanied by device specific diagnosis software or eft rules.
h6. DDI_FM_DEVICE_INVAL_STATE
The driver has detected that the device is in an invalid state.
A driver should post an error when it detects that the data it transmits or receives appear to be invalid. For example, in the {{bge}} code, the {{bge_chip_reset()}} and {{bge_receive_ring()}} routines generate the {{ereport.io.device.inval_state}} error when these routines detect invalid data.
{code}
/*
* The SEND INDEX registers should be reset to zero by the
* global chip reset; if they're not, there'll be trouble
* later on.
*/
sx0 = bge_reg_get32(bgep, NIC_DIAG_SEND_INDEX_REG(0));
if (sx0 != 0) {
BGE_REPORT((bgep, "SEND INDEX - device didn't RESET"));
bge_fm_ereport(bgep, DDI_FM_DEVICE_INVAL_STATE);
return (DDI_FAILURE);
}
/* ... */
/*
* Sync (all) the receive ring descriptors
* before accepting the packets they describe
*/
DMA_SYNC(rrp->desc, DDI_DMA_SYNC_FORKERNEL);
if (*rrp->prod_index_p >= rrp->desc.nslots) {
bgep->bge_chip_state = BGE_CHIP_ERROR;
bge_fm_ereport(bgep, DDI_FM_DEVICE_INVAL_STATE);
return (NULL);
}
{code}
h6. DDI_FM_DEVICE_INTERN_CORR
The device has reported a self-corrected internal error. For example, a correctable ECC error has been detected by the hardware in an internal buffer within the device.
This error flag is not used in the {{bge}} driver. See the {{nxge_fm.c}} file on OpenSolaris for examples that use this error. Take the following steps to study the {{nxge}} driver code:** Go to [http://www.opensolaris.org/os/].** Click [http://cvs.opensolaris.org/source/] under the Code heading in the menu on the left side of the page.** Enter {{nxge}} in the File Path field.** Click the Search button.
h6. DDI_FM_DEVICE_INTERN_UNCORR
The device has reported an uncorrectable internal error. For example, an uncorrectable ECC error has been detected by the hardware in an internal buffer within
the device.
This error flag is not used in the {{bge}} driver. See the {{nxge_fm.c}} file on OpenSolaris for examples that use this error.
h6. DDI_FM_DEVICE_STALL
The driver has detected that data transfer has stalled unexpectedly.
The {{bge_factotum_stall_check()}} routine provides an example of stall detection.
{code}
dogval = bge_atomic_shl32(&bgep->watchdog, 1);
if (dogval < bge_watchdog_count)
return (B_FALSE);
BGE_REPORT((bgep, "Tx stall detected,
watchdog code 0x%x", dogval));
bge_fm_ereport(bgep, DDI_FM_DEVICE_STALL);
return (B_TRUE);
{code}
h6. DDI_FM_DEVICE_NO_RESPONSE
The device is not responding to a driver command.
{code}
bge_chip_poll_engine(bge_t *bgep, bge_regno_t regno,
uint32_t mask, uint32_t val)
{
uint32_t regval;
uint32_t n;
for (n = 200; n; --n) {
regval = bge_reg_get32(bgep, regno);
if ((regval & mask) == val)
return (B_TRUE);
drv_usecwait(100);
}
bge_fm_ereport(bgep, DDI_FM_DEVICE_NO_RESPONSE);
return (B_FALSE);
}
{code}
h6. DDI_FM_DEVICE_BADINT_LIMIT
The device has raised too many consecutive invalid interrupts.
The {{bge_intr()}} routine within the {{bge}} driver provides an example of stuck interrupt detection. The {{bge_fm_ereport()}} function is a wrapper for the {{ddi_fm_ereport_post}}(9F) function. See the {{bge_fm_ereport()}} example in [Queueing an Error Event|#gemfu].
{code}
if (bgep->missed_dmas >= bge_dma_miss_limit) {
/*
* If this happens multiple times in a row,
* it means DMA is just not working. Maybe
* the chip has failed, or maybe there's a
* problem on the PCI bus or in the host-PCI
* bridge (Tomatillo).
*
* At all events, we want to stop further
* interrupts and let the recovery code take
* over to see whether anything can be done
* about it ...
*/
bge_fm_ereport(bgep,
DDI_FM_DEVICE_BADINT_LIMIT);
goto chip_stop;
}
{code}{anchor:gemgp}
h5. Service Impact Function
A fault management capable driver must indicate whether or not an error has impacted the services provided by a device. Following detection of an error and, if necessary, a shutdown of services, the driver should invoke the [{{ddi_fm_service_impact}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-service-impact-9f?a=view] routine to reflect the current service state of the device instance. The service state can be used by diagnosis and recovery software to help identify or react to the problem.
The {{ddi_fm_service_impact()}} routine should be called both when an error has
been detected by the driver itself, and when the framework has detected an error and marked an access or DMA handle as faulty.
{code}
void ddi_fm_service_impact(dev_info_t *_dip_, int _svc_impact_)
{code}
The following service impact values (_svc_impact_) are accepted by {{ddi_fm_service_impact()}}:
h6. DDI_SERVICE_LOST
The service provided by the device is unavailable due to a device fault or software defect.
h6. DDI_SERVICE_DEGRADED
The driver is unable to provide normal service, but the driver can provide a partial or degraded level of service. For example, the driver might have to make repeated attempts to perform an operation before it succeeds, or it might be running at less that its configured speed.
h6. DDI_SERVICE_UNAFFECTED
The driver has detected an error, but the services provided by the device instance are unaffected.
h6. DDI_SERVICE_RESTORED
All of the device's services have been restored.
The call to {{ddi_fm_service_impact()}} generates the following ereports on behalf of the driver, based on the service impact argument to the service impact routine:
* {{ereport.io.service.lost}}
* {{ereport.io.service.degraded}}
* {{ereport.io.service.unaffected}}
* {{ereport.io.service.restored}}
In the following {{bge}} code, the driver determines that it is unable to successfully restart transmitting or receiving packets as the result of an error. The service state of the device transitions to DDI_SERVICE_LOST.
{code}
/*
* All OK, reinitialize hardware and kick off GLD scheduling
*/
mutex_enter(bgep->genlock);
if (bge_restart(bgep, B_TRUE) != DDI_SUCCESS) {
(void) bge_check_acc_handle(bgep, bgep->cfg_handle);
(void) bge_check_acc_handle(bgep, bgep->io_handle);
ddi_fm_service_impact(bgep->devinfo, DDI_SERVICE_LOST);
mutex_exit(bgep->genlock);
return (DDI_FAILURE);
}
{code}
{info:title=Note}The {{ddi_fm_service_impact()}} function should not be called from the registered callback routine.{info}
{anchor:gemhz}
h4. Access Attributes Structure
A {{DDI_FM_ACCCHK_CAPABLE}} device driver must set its access attributes to indicate that it is capable of handling programmed I/O (PIO) access errors that occur during a register read or write. The {{devacc_attr_access}} field in the [{{ddi_device_acc_attr}}(9S)|http://docs.sun.com/doc/819-2257/ddi-device-acc-attr-9s?a=view] structure should be set as an indicator to the system that the driver is capable of checking for and handling data path errors. The {{ddi_device_acc_attr}} structure contains the following members:
{code}
ushort_t devacc_attr_version;
uchar_t devacc_attr_endian_flags;
uchar_t devacc_attr_dataorder;
uchar_t devacc_attr_access; /* access error protection */
{code}
Errors detected in the data path to or from a device can be processed by one or
more of the device driver's nexus parents.
The {{devacc_attr_access}} field can be set to the following values:
h6. DDI_DEFAULT_ACC
This flag indicates the system will take the default action (panic if appropriate) when an error occurs. This attribute cannot be used by DDI_FM_ACCCHK_CAPABLE
drivers.
h6. DDI_FLAGERR_ACC
This flag indicates that the system will attempt to handle and recover from an error associated with the access handle. The driver should use the techniques described in [Defensive Programming Techniques for Solaris Device Drivers|Hardening-DefProg] and should use [{{ddi_fm_acc_err_get}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-acc-err-get-9f?a=view] to regularly check for errors before the driver allows data to be passed back to the calling application.
The DDI_FLAGERR_ACC flag provides:
* Error notification via the driver callback
* An error condition observable via {{ddi_fm_acc_err_get}}(9F)
h6. DDI_CAUTIOUS_ACC
The DDI_CAUTIOUS_ACC flag provides a high level of protection for each Programmed I/O access made by the driver.
{info:title=Note}Use of this flag will cause a significant impact on the performance of the driver.{info}
The DDI_CAUTIOUS_ACC flag signifies that an error is anticipated by the accessing driver. The system attempts to handle and recover from an error associated with this handle as gracefully as possible. No error reports are generated as a result, but the handle's {{fme_status}} flag is set to DDI_FM_NONFATAL. This flag is functionally equivalent to [{{ddi_peek}}(9F)|http://docs.sun.com/doc/819-2256/ddi-peek-9f?a=view] and [{{ddi_poke}}(9F)|http://docs.sun.com/doc/819-2256/ddi-poke-9f?a=view].
The use of the DDI_CAUTIOUS_ACC provides:
* Exclusive access to the bus
* On trap protection - ({{ddi_peek()}} and {{ddi_poke()}})
* Error notification through the driver callback registered with [{{ddi_fm_handler_register}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-handler-register-9f?a=view]
* An error condition observable through {{ddi_fm_acc_err_get}}(9F)
Generally, drivers should check for data path errors at appropriate junctures in the code path to guarantee consistent data and to ensure that proper error status is presented in the I/O software stack.
DDI_FM_ACCCHK_CAPABLE device drivers must set their {{devacc_attr_access}} field to DDI_FLAGERR_ACC or DDI_CAUTIOUS_ACC.
{anchor:gemhh}
h4. DMA Attributes Structure
As with access handle setup, a DDI_FM_DMACHK_CAPABLE device driver must set the
{{dma_attr_flag}} field of its [{{ddi_dma_attr}}(9S)|http://docs.sun.com/doc/819-2257/ddi-dma-attr-9s?a=view] structure to the DDI_DMA_FLAGERR flag. The system attempts to recover from an error associated with a handle that has DDI_DMA_FLAGERR set. The {{ddi_dma_attr}} structure contains the following members:
{code}
uint_t dma_attr_version; /* version number */
uint64_t dma_attr_addr_lo; /* low DMA address range */
uint64_t dma_attr_addr_hi; /* high DMA address range */
uint64_t dma_attr_count_max; /* DMA counter register */
uint64_t dma_attr_align; /* DMA address alignment */
uint_t dma_attr_burstsizes; /* DMA burstsizes */
uint32_t dma_attr_minxfer; /* min effective DMA size */
uint64_t dma_attr_maxxfer; /* max DMA xfer size */
uint64_t dma_attr_seg; /* segment boundary */
int dma_attr_sgllen; /* s/g length */
uint32_t dma_attr_granular; /* granularity of device */
uint_t dma_attr_flags; /* Bus specific DMA flags */
{code}
Drivers that set the DDI_DMA_FLAGERR flag should use the techniques described in [Defensive Programming Techniques for Solaris Device Drivers|Hardening-DefProg] and should use [{{ddi_fm_dma_err_get}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-dma-err-get-9f?a=view] to check for data path errors whenever DMA transactions are completed or at significant points within the code path. This ensures consistent data and proper error status presented to the I/O software stack.
Use of DDI_DMA_FLAGERR provides:
* Error notification via the driver callback registered with {{ddi_fm_handler_register()}}
* An error condition observable by calling {{ddi_fm_dma_err_get()}}
{anchor:gemfy}
h4. Getting Error Status
If a fault has occurred that affects the resource mapped by the handle, the error status structure is updated to reflect error information captured during error handling by a bus or other device driver in the I/O data path.
{code}
void ddi_fm_dma_err_get(ddi_dma_handle_t handle, ddi_fm_error_t *de, int version)
void ddi_fm_acc_err_get(ddi_acc_handle_t handle, ddi_fm_error_t *de, int version)
{code}
The {{ddi_fm_dma_err_get}}(9F)and {{ddi_fm_acc_err_get}}(9F) functions return the error status for a DMA or access handle respectively. The version field should be set to DDI_FME_VERSION.
An error for an access handle means that an error has been detected that has affected PIO transactions to or from the device using that access handle. Any data received by the driver, for example via a recent [{{ddi_get8}}(9F)|http://docs.sun.com/doc/819-2256/ddi-get8-9f?a=view] call, should be considered potentially corrupt. Any data sent to the device, for example via a recent [{{ddi_put32}}(9F)|http://docs.sun.com/doc/819-2256/ddi-put32-9f?a=view] call might also have been corrupted or might not have been received at all. The underlying fault might, however, be transient, and the driver can therefore attempt to recover by calling [{{ddi_fm_acc_err_clear}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-dma-err-clear-9f?a=view], resetting the device to get it back into a known state, and retrying any potentially failed transactions.
If an error is indicated for a DMA handle, it implies that an error has been detected that has (or will) affect DMA transactions between the device and the memory currently bound to the handle (or most recently bound, if the handle is currently unbound). Possible causes include the failure of a component in the DMA data path, or an attempt by the device to make an invalid DMA access. The driver might be able to continue by retrying and reallocating memory. The contents of the memory currently (or previously) bound to the handle should be regarded as indeterminate and should be released back to the system. The fault indication associated with the current transaction is lost once the handle is bound or re-bound, but because the fault might persist, future DMA operations might not succeed.
{anchor:gemfr}
h4. Clearing Errors
These routines should be called when the driver wants to retry a request after an error was detected by the handle without needing to free and reallocate the handle first.
{code}
void ddi_fm_acc_err_clear(ddi_acc_handle_t handle, int version)
void ddi_fm_dma_err_clear(ddi_dma_handle_t handle, int version)
{code}
{anchor:gemie}
h4. Registering an Error Handler
Error handling activity might begin at the time that the error is detected by the operating system via a trap or error interrupt. If the software responsible for handling the error (the error handler) cannot immediately isolate the device that was involved in the failed I/O operation, it must attempt to find a software module within the device tree that can perform the error isolation. The Solaris device tree provides a structural means to propagate nexus driver error handling activities to children who might have a more detailed understanding of the error and can capture error state and isolate the problem device.
A driver can register an error handler callback with the I/O Fault Services Framework. The error handler should be specific to the type of error and subsystem where error detection has occurred. When the driver's error handler routine is invoked, the driver must check for any outstanding errors associated with device transactions and generate ereport events. The driver must also return error handler status in its [{{ddi_fm_error}}(9S)|http://docs.sun.com/app/docs/doc/819-2257/ddi-fm-error-9s?a=view] structure. For example, if it has been determined that the system's integrity has been compromised, the most appropriate action might be for the error handler to panic the system.
The callback is invoked by a parent nexus driver when an error might be associated with a particular device instance. Device drivers that register error handlers must be DDI_FM_ERRCB_CAPABLE.
{code}
void ddi_fm_handler_register(dev_info_t *_dip_, ddi_err_func_t _handler_, void *_impl_data_)
{code}
The {{ddi_fm_handler_register}}(9F) routine registers an error handler callback
with the I/O fault services framework. The {{ddi_fm_handler_register()}} function should be called in the driver's [{{attach}}(9E)|http://docs.sun.com/doc/819-2255/attach-9e?a=view] entry point for callback registration following driver fault management initialization ({{ddi_fm_init()}}).
The error handler callback function must do the following:
* Check for any outstanding hardware errors associated with device transactions, and generate ereport events for diagnosis. For a PCI, PCI-x, or PCI express device this can generally be done using {{pci_ereport_post()}} as described in [Detecting and Reporting PCI-Related Errors|#gemfk].
* Return error handler status in its {{ddi_fm_error}} structure:
** DDI_FM_OK
** DDI_FM_FATAL
** DDI_FM_NONFATAL
** DDI_FM_UNKNOWN
Driver error handlers receive the following:
* A pointer to a device instance (_dip_) under the driver's control
* A data structure ({{ddi_fm_error}}) that contains common fault management data and status for error handling
* A pointer to any implementation specific data (_impl_data_) specified at the time of the handler's registration
The [{{ddi_fm_handler_register(9F)}}|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-handler-register-9f?a=view] and [{{ddi_fm_handler_unregister(9F)}}|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-handler-unregister-9f?a=view] routines must be called from kernel context in a driver's {{attach}}(9E) or {{detach}}(9E) entry point. The registered error handler callback can be called from kernel, interrupt, or high-level interrupt context. Therefore the error handler:
* Must not hold locks
* Must not sleep waiting for resources
A device driver is responsible for:
* Isolating the device instance that might have caused errors
* Recovering transactions associated with errors
* Reporting the service impact of errors
* Scheduling device shutdown for errors considered fatal
These actions can be carried out within the error handler function. However, because of the restrictions on locking and because the error handler function does not always know the context of what the driver was doing at the point where the fault occurred, it is more usual for these actions to be carried out following inline calls to {{ddi_fm_acc_err_get}}(9F) and {{ddi_fm_dma_err_get}}(9F) within the normal paths of the driver as described previously.
{code}
/*
* The I/O fault service error handling callback function
*/
/*ARGSUSED*/
static int
bge_fm_error_cb(dev_info_t *dip, ddi_fm_error_t *err, const void *impl_data)
{
/*
* as the driver can always deal with an error
* in any dma or access handle, we can just return
* the fme_status value.
*/
pci_ereport_post(dip, err, NULL);
return (err->fme_status);
}
{code}
{anchor:gemhd}
h4. Fault Management Data and Status Structure
Driver error handling callbacks are passed a pointer to a data structure that contains common fault management data and status for error handling.
The data structure {{ddi_fm_error}} contains an FMA protocol ENA for the current error, the status of the error handler callback, an error expectation flag, and any potential access or DMA handles associated with an error detected by the parent nexus.
h6. {{fme_ena}}
This field is initialized by the calling parent nexus and might have been incremented along the error handling propagation chain before reaching the driver's registered callback routine. If the driver detects a related error of its own, it should increment this ENA prior to calling {{ddi_fm_ereport_post()}}.
h6. {{fme_acc_handle}}, {{fme_dma_handle}}
These fields contain a valid access or DMA handle if the parent was able to associate an error detected at its level to a handle mapped or bound by the device driver.
h6. {{fme_flag}}
The {{fme_flag}} is set to DDI_FM_ERR_EXPECTED if the calling parent determines
the error was the result of a DDI_CAUTIOUS_ACC protected operation. In this case, the {{fme_acc_handle}} is valid and the driver should check for and report only errors not associated with the DDI_CAUTIOUS_ACC protected operation. Otherwise, {{fme_flag}} is set to DDI_FM_ERR_UNEXPECTED and the driver must perform the full range of error handling tasks.
h6. {{fme_status}}
Upon return from its error handler callback, the driver must set {{fme_status}}
to one of the following values:
* DDI_FM_OK – No errors were detected and the operational state of this device instance remains the same.
* DDI_FM_FATAL – An error has occurred and the driver considers it to be
fatal to the system. For example, a call to {{pci_ereport_post}}(9F) might have
detected a system fatal error. In this case, the driver should report any additional error information it might have in the context of the driver.
* DDI_FM_NONFATAL – An error has been detected by the driver but is not considered fatal to the system. The driver has identified the error and has either isolated the error or is committing that it will isolate the error.
* DDI_FM_UNKNOWN – An error has been detected, but the driver is unable to isolate the device or determine the impact of the error on the operational state of the system.
{anchor:gemfs}
h3. Diagnosing Faults
The fault management daemon, [{{fmd}}(1M)|http://docs.sun.com/doc/819-2240/fmd-1m?a=view], provides a programming interface for the development of diagnosis engine (DE) plug-in modules. A DE can be written to consume and diagnose any error telemetry or specific error telemetries. The eft DE was designed to diagnose any number of ereport classes based on diagnosis rules specified in the Eversholt language.
{anchor:gemge}
h4. Standard Leaf Device Diagnosis
Most I/O subsystems use the eft DE and rules sets to diagnose device and device
driver related problems. A standard set of ereports, listed in [Reporting Standard I/O Controller Errors|#gemha], has been specified for PCI leaf devices. Accompanying these ereports are eft diagnosis rules that take the telemetry and identify the associated device fault. Drivers that generate these ereports do not need to deliver any additional diagnosis software or eft rules.
The detection and generation of these ereports produces the following fault events:
h6. {{fault.io.pci.bus-linkerr}}
A hardware fault on the PCI bus
h6. {{fault.io.pci.device-interr}}
A hardware fault within the device
h6. {{fault.io.pci.device-invreq}}
A hardware fault in the device or a defect in the driver that causes the device
to send an invalid request
h6. {{fault.io.pci.device-noresp}}
A hardware fault in the device that causes the driver not to respond to a valid
request
h6. {{fault.io.pciex.bus-linkerr}}
A hardware fault on the link
h6. {{fault.io.pciex.bus-noresp}}
The link going down so that a device cannot respond to a valid request
h6. {{fault.io.pciex.device-interr}}
A hardware fault within the device
h6. {{fault.io.pciex.device-invreq}}
A hardware fault in the device or a defect in the driver that causes the device
to send an invalid request
h6. {{fault.io.pciex.device-noresp}}
A hardware fault in the device causing it not to respond to a valid request
{anchor:gemia}
h4. Specialized Device Diagnosis
Driver developers who want to generate additional ereports or provide more specialized diagnosis software or eft rules can do so by writing a C-based DE or an eft diagnosis rules set. See the [Fault Management community|http://www.opensolaris.org/os/community/fm/] on the [OpenSolaris project|http://www.opensolaris.org/os/] for information.
{anchor:gemhe}
h3. Event Registry
The Sun event registry is the central repository of all class names, ereports, faults, defects, upsets and suspect lists (list.suspect) events. The event registry also contains the current definitions of all event member payloads, as well as important non-payload information like internal documentation, suspect lists, dictionaries, and knowledge articles. For example, {{ereport.io}} and {{fault.io}} are two of the base class names that are of particular importance to I/O driver developers.
The FMA event protocol defines a base set of payload members that is supplied with each of the registered events. Developers can also define additional events that help diagnosis engines (or eft rules) to narrow a suspect list down to a specific fault.
{anchor:gemgu}
h3. Glossary
This section uses the following terms:
h6. Agent
A generic term used to describe fault manager modules that subscribe to fault.*
or list.* events. Agents are used to retire faulty resources, communicate diagnosis results to Administrators, and bridge to higher-level management frameworks.
h6. ASRU (Automated System Reconfiguration Unit)
The ASRU is a resource that can be disabled by software or hardware in order to
isolate a problem in the system and suppress further error reports.
h6. DE (Diagnosis Engine)
A fault management module whose purpose is to diagnose problems by subscribing to one or more classes of incoming error events and using these events to solve cases associated with each problem on the system.
h6. ENA (Error Numeric Association)
An Error Numeric Association (ENA) is an encoded integer that uniquely identifies an error report within a given fault region and time period. The ENA also indicates the relationship of the error to previous errors as a secondary effect.
h6. Error
An unexpected condition, result, signal, or datum. An error is the symptom of a
problem on the system. Each problem typically produces many different kinds of errors.
h6. ereport (Error Report)
The data captured with a particular error. Error report formats are defined in advance by creating a class naming the error report and defining a schema using the Sun event registry.
h6. ereport event (Error Event)
The data structure that represents an instance of an error report. Error events
are represented as name-value pair lists.
h6. Fault
Malfunctioning behavior of a hardware component.
h6. Fault Boundary
Logical partition of hardware or software elements for which a specific set of faults can be enumerated.
h6. Fault Event
An instance of a fault diagnosis encoded in the protocol.
h6. Fault Manager
Software component responsible for fault diagnosis via one or more diagnosis engines and state management.
h6. FMRI (Fault Managed Resource Identifier)
An FMRI is a URL-like identifier that acts as the canonical name for a particular resource in the fault management system. Each FMRI includes a scheme that identifies the type of resource, and one or more values that are specific to the scheme. An FMRI can be represented as URL-like string or as a name-value pair list data structure.
h6. FRU (Field Replaceable Unit)
The FRU is a resource that can be replaced in the field by a customer or service provider. FRUs can be defined for hardware (for example system boards) or for software (for example software packages or patches).
{anchor:gemhq}
h3. Resources
The following resources provide additional information:
* [Fault Management OpenSolaris community|http://www.opensolaris.org/os/community/fm/]
* [FMA Messaging web site|http://www.sun.com/msg/]
----
Next: [Defensive Programming Techniques for Solaris Device Drivers|Hardening-DefProg] > [Driver Hardening Test Harness|Hardening-BOFI]
h1. Hardening Solaris Drivers
Fault Management Architecture (FMA) I/O Fault Services enable driver developers to integrate fault management capabilities into I/O device drivers. The Solaris I/O fault services framework defines a set of interfaces that enable all drivers to coordinate and perform basic error handling tasks and activities. The Solaris FMA as a whole provides for error handling and fault diagnosis, in addition to response and recovery. FMA is a component of Sun's Predictive Self-Healing strategy.
A driver is considered hardened when it uses the defensive programming practices described in this document in addition to the I/O fault services framework for error handling and diagnosis. The driver hardening test harness tests that the I/O fault services and defensive programming requirements have been correctly fulfilled.
This document contains the following sections:
* [Sun Fault Management Architecture I/O Fault Services|#fmaiofs] provides a reference for driver developers who want to integrate fault management capabilities into I/O device drivers.
* [Defensive Programming Techniques for Solaris Device Drivers|Hardening-DefProg] provides general information about how to defensively write a Solaris device driver.
* [Driver Hardening Test Harness|Hardening-BOFI] is a driver development tool that injects simulated hardware faults when the driver under development accesses its hardware.
{anchor:fmaiofs}
h2. Sun Fault Management Architecture I/O Fault Services
This section explains how to integrate fault management error reporting, error handling, and diagnosis for I/O device drivers. This section provides an in-depth examination of the I/O fault services framework and how to utilize the I/O fault service APIs within a device driver.
This section discusses the following topics:
* [What Is Predictive Self-Healing?|#gemgv] provides background and an overview
of the Sun Fault Management Architecture.
* [Solaris Fault Manager|#gemgw] describes additional background with a focus on a high-level overview of the Solaris Fault Manager, {{fmd}}(1M).
* [Error Handling|#gemgl] is the primary section for driver developers. This section highlights the best practice coding techniques for high-availability and the use of I/O fault services in driver code to interact with the FMA.
* [Diagnosing Faults|#gemfs] describes how faults are diagnosed from the errors
detected by drivers.
* [Event Registry|#gemhe] provides information on Sun's Event Registry.
{anchor:gemgv}
h3. What Is Predictive Self-Healing?
Traditionally, systems have exported hardware and software error information directly to human administrators and to management software in the form of syslog messages. Often, error detection, diagnosis, reporting, and handling was embedded in the code of each driver.
A system like the Solaris OS predictive self-healing system is first and foremost self-diagnosing. Self-diagnosing means the system provides technology to automatically diagnose problems from observed symptoms, and the results of the diagnosis can then be used to trigger automated response and recovery. A *fault* in hardware or a defect in software can be associated with a set of possible observed symptoms called *errors*. The data generated by the system as the result of observing an error is called an error report or *ereport*.
In a system capable of self-healing, ereports are captured by the system and are encoded as a set of name-value pairs described by an extensible event protocol to form an *ereport event*. Ereport events and other data are gathered to facilitate self-healing, and are dispatched to software components called diagnosis engines designed to diagnose the underlying problems corresponding to the error symptoms observed by the system. A *diagnosis engine* runs in the background and silently consumes error telemetry until it can produce a diagnosis or predict a fault.
After processing sufficient telemetry to reach a conclusion, a diagnosis engine
produces another event called a *fault event*. The fault event is then broadcast to all agents that are interested in the specific fault event. An *agent* is a software component that initiates recovery and responds to specific fault events. A software component known as the Solaris Fault Manager, [{{fmd}}(1M)|http://docs.sun.com/doc/819-2240/fmd-1m?a=view], manages the multiplexing of events between ereport generators, diagnosis engines, and agent software.
{anchor:gemgw}
h3. Solaris Fault Manager
The Solaris Fault Manager, [{{fmd}}(1M)|http://docs.sun.com/doc/819-2240/fmd-1m?a=view], is responsible for dispatching in-bound error telemetry events to the appropriate diagnosis engines. The diagnosis engine is responsible for identifying the underlying hardware faults or software defects that are producing the error symptoms.
The {{fmd}}(1M) daemon is the Solaris OS implementation of a fault manager. It starts at boot time and loads all of the diagnosis engines and agents available on the system. The Solaris Fault Manager also provides interfaces for system administrators and service personnel to observe fault management activity.
{anchor:gemft}
h4. Diagnosis, Suspect Lists, and Fault Events
Once a diagnosis has been made, the diagnosis is output in the form of a *list.suspect* event. A list.suspect event is an event comprised of one or more possible fault or defect events. Sometimes the diagnosis cannot narrow the cause of errors to a single fault or defect. For example, the underlying problem might be a broken wire connecting controllers to the main system bus. The problem might be with a component on the bus or with the bus itself. In this specific case, the list.suspect event will contain multiple fault events: one for each controller attached to the bus, and one for the bus itself.
In addition to describing the fault that was diagnosed, a fault event also contains four payload members for which the diagnosis is applicable.
* The *resource* is the component that was diagnosed as faulty. The [{{fmdump}}(1M)|http://docs.sun.com/doc/819-2240/fmdump-1m?a=view] command shows this payload member as “Problem in.”
* The *Automated System Recovery Unit* (ASRU) is the hardware or software component that must be disabled to prevent further error symptoms from occurring. The {{fmdump}}(1M) command shows this payload member as “Affects.”
* The *Field Replaceable Unit* (FRU) is the component that must be replaced or repaired to fix the underlying problem.
* The *Label* payload is a string that gives the location of the FRU in the same form as it is printed on the chassis or motherboard, for example next to a DIMM slot or PCI card slot. The {{fmdump}}command shows this payload member as “Location.”
For example, after receiving a certain number of ECC correctable errors in a given amount of time for a particular memory location, the CPU and memory diagnosis engine issues a diagnosis (list.suspect event) for a faulty DIMM.
{code}
# fmdump -v -u 38bd6f1b-a4de-4c21-db4e-ccd26fa8573c
TIME UUID SUNW-MSG-ID
Oct 31 13:40:18.1864 38bd6f1b-a4de-4c21-db4e-ccd26fa8573c AMD-8000-8L
100% fault.cpu.amd.icachetag
Problem in: hc:///motherboard=0/chip=0/cpu=0
Affects: cpu:///cpuid=0
FRU: hc:///motherboard=0/chip=0
Location: SLOT 2
{code}
In this example, [{{fmd}}(1M)|http://docs.sun.com/doc/819-2240/fmd-1m?a=view] has identified a problem in a resource, specifically a CPU ({{hc:///motherboard=0/chip=0/cpu=0}}). To suppress further error symptoms and to prevent an uncorrectable error from occurring, an ASRU, ({{cpu:///cpuid=0}}), is identified for retirement. The component that needs to be replaced is the FRU ({{hc:///motherboard=0/chip=0}}).
{anchor:gemgg}
h4. Response Agents
An *agent* is a software component that takes action in response to a diagnosis or repair. For example, the CPU and memory retire agent is designed to act on list.suspects that contain a fault.cpu event. The {{cpumem-retire}} agent will attempt to off-line a CPU or retire a physical memory page from service. If the agent is successful, an entry in the fault manager's ASRU cache is added for the page or CPU that was successfully retired. The [{{fmadm}}(1M)|http://docs.sun.com/doc/819-2240/fmadm-1m?a=view] utility, as shown in the example below, shows an entry for a memory rank that has been diagnosed as having a fault. ASRUs that the system does not have the ability to off-line, retire, or disable, will also have an entry in the ASRU cache, but they will be seen as degraded. Degraded means the resource associated with the ASRU is faulty, but the ASRU is unable to be removed from service. Currently Solaris agent software cannot act upon I/O ASRUs (device instances). All faulty I/O resource entries in the cache are in the degraded state.
{code}
# fmadm faulty
STATE RESOURCE / UUID
-------- ----------------------------------------------------------------------
degraded mem:///motherboard=0/chip=1/memory-controller=0/dimm=3/rank=0
ccae89df-2217-4f5c-add4-d920f78b4faf
-------- ----------------------------------------------------------------------
{code}
The primary purpose of a *retire agent* is to isolate (safely remove from service) the piece of hardware or software that has been diagnosed as faulty.
Agents can also take other important actions such as the following actions:
* Send alerts via SNMP traps. This can translate a diagnosis into an alert for SNMP that plugs into existing software mechanisms.
* Post a syslog message. Message specific diagnoses (for example, syslog message agent) can take the result of a diagnosis and translate it into a syslog message that administrators can use to take a specific action.
* Other agent actions such as update the FRUID. Response agents can be platform-specific.
{anchor:gemfg}
h4. Message IDs and Dictionary Files
The syslog message agent takes the output of the diagnosis (the list.suspect event) and writes specific messages to the console or {{/var/adm/messages}}. Often console messages can be difficult to understand. FMA remedies this problem by providing a defined fault message structure that is generated every time a list.suspect event is delivered to a syslog message.
The syslog agent generates a message identifier (MSG ID). The event registry generates dictionary files ({{.dict}} files) that map a list.suspect event to a structured message identifier that should be used to identify and view the associated knowledge article. Message files, ({{.po}} files) map the message ID to localized messages for every possible list of suspected faults that the diagnosis engine can generate. The following is an example of a fault message emitted on a test system.
{code}
SUNW-MSG-ID: AMD-8000-7U, TYPE: Fault, VER: 1, SEVERITY: Major
EVENT-TIME: Fri Jul 28 04:26:51 PDT 2006
PLATFORM: Sun Fire V40z, CSN: XG051535088, HOSTNAME: parity
SOURCE: eft, REV: 1.16
EVENT-ID: add96f65-5473-69e6-dbe1-8b3d00d5c47b
DESC: The number of errors associated with this CPU has exceeded
acceptable levels. Refer to http://sun.com/msg/AMD-8000-7U for
more information.
AUTO-RESPONSE: An attempt will be made to remove this CPU from service.
IMPACT: Performance of this system may be affected.
REC-ACTION: Schedule a repair procedure to replace the affected CPU.
Use fmdump -v -u <EVENT_ID> to identify the module.
{code}
{anchor:gemfo}
h4. System Topology
To identify where a fault might have occurred, diagnosis engines need to have the topology for a given software or hardware system represented. The [{{fmd}}(1M)|http://docs.sun.com/doc/819-2240/fmd-1m?a=view] daemon provides diagnosis engines with a handle to a topology snapshot that can be used during diagnosis. Topology information is used to represent the resource, ASRU, and FRU found in each fault event. The topology can also be used to store the platform label, FRUID, and serial number identification.
The resource payload member in the fault event is always represented by the physical path location from the platform chassis outward. For example, a PCI controller function that is bridged from the main system bus to a PCI local bus is represented by its {{hc}} scheme path name:
{code}
hc:///motherboard=0/hostbridge=1/pcibus=0/pcidev=13/pcifn=0
{code}
The ASRU payload member in the fault event is typically represented by the Solaris device tree instance name that is bound to a hardware controller, device, or function. FMA uses the {{dev}} scheme to represent the ASRU in its native format for actions that might be taken by a future implementation of a retire agent specifically designed for I/O devices:
{code}
dev:////pci@1e,600000/ide@d
{code}
The FRU payload representation in the fault event varies depending on the closest replaceable component to the I/O resource that has been diagnosed as faulty. For example, a fault event for a broken embedded PCI controller might name the motherboard of the system as the FRU that needs to be replaced:
{code}
hc:///motherboard=0
{code}
The label payload is a string that gives the location of the FRU in the same form as it is printed on the chassis or motherboard, for example next to a DIMM slot or PCI card slot:
{code}
Label: SLOT 2
{code}
{anchor:gemgl}
h3. Error Handling
This section describes how to use I/O fault services APIs to handle errors within a driver. This section discusses how drivers should indicate and initialize their fault management capabilities, generate error reports, and register the driver's error handler routine.
Excerpts are provided from source code examples that demonstrate the use of the I/O fault services API from the Broadcom 1Gb NIC driver, {{bge}}. Follow these examples as a model for how to integrate fault management capability into your own drivers. Take the following steps to study the complete {{bge}} driver code:
* Go to [ON (OS/Net) Sources|http://src.opensolaris.org/source/].
* Enter {{bge}} in the File Path field.
* Click the Search button.
Drivers that have been instrumented to provide FMA error report telemetry detect errors and determine the impact of those errors on the services provided by the driver. Following the detection of an error, the driver should determine when its services have been impacted and to what degree.
An I/O driver must respond immediately to detected errors. Appropriate responses include:
* Attempt recovery
* Retry an I/O transaction
* Attempt fail-over techniques
* Report the error to the calling application/stack
* If the error cannot be constrained any other way, then panic
Errors detected by the driver are communicated to the fault management daemon as an *ereport*. An ereport is a structured event defined by the FMA event protocol. The event protocol is a specification for a set of common data fields that must be used to describe all possible error and fault events, in addition to the list of suspected faults. Ereports are gathered into a flow of error telemetry and dispatched to the diagnosis engine.
{anchor:gemfi}
h4. Declaring Fault Management Capabilities
A hardened device driver must declare its fault management capabilities to the I/O Fault Management framework. Use the [{{ddi_fm_init}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-init-9f?a=view] function to declare the fault management capabilities of your driver.
{code}
void ddi_fm_init(dev_info_t *_dip_, int *_fmcap_, ddi_iblock_cookie_t *_ibcp_)
{code}
The {{ddi_fm_init()}} function can be called from kernel context in a driver [{{attach}}()|http://docs.sun.com/doc/819-2255/attach-9e?a=view] or [{{detach}}()|http://docs.sun.com/doc/819-2255/detach-9e?a=view] entry point. The {{ddi_fm_init()}} function usually is called from the {{attach()}} entry point. The {{ddi_fm_init()}} function allocates and initializes resources according to _fmcap_. The _fmcap_ parameter must be set to the bitwise-inclusive-OR of the following fault management capabilities:
* {{DDI_FM_EREPORT_CAPABLE}} - Driver is responsible for and capable of generating FMA protocol error events (ereports) upon detection of an error condition.
* {{DDI_FM_ACCCHK_CAPABLE}} - Driver is responsible for and capable of checking for errors upon completion of one or more access I/O transactions.
* {{DDI_FM_DMACHK_CAPABLE}} - Driver is responsible for and capable of checking for errors upon completion of one or more DMA I/O transactions.
* {{DDI_FM_ERRCB_CAPABLE}} - Driver has an error callback function.
A hardened leaf driver generally sets all these capabilities. However, if its parent nexus is not capable of supporting any one of the requested capabilities, the associated bit is cleared and returned as such to the driver. Before returning from {{ddi_fm_init}}(9F), the I/O fault services framework creates a set of fault management capability properties: {{fm-ereport-capable}}, {{fm-accchk-capable}}, {{fm-dmachk-capable}} and {{fm-errcb-capable}}. The currently supported fault management capability level is observable by using the [{{prtconf}}(1M)|http://docs.sun.com/doc/819-2240/prtconf-1m?a=view] command.
To make your driver support administrative selection of fault management capabilities, export and set the fault management capability level properties to the values described above in the [{{driver.conf}}|http://docs.sun.com/doc/819-2251/driver.conf-4?a=view] file. The {{fm-capable}} properties must be set and read prior to calling {{ddi_fm_init()}} with the desired capability list.
The following example from the {{bge}} driver shows the {{bge_fm_init()}} function, which calls the {{ddi_fm_init}}(9F) function. The {{bge_fm_init()}} function is called in the {{bge_attach()}} function.
{code}
static void
bge_fm_init(bge_t *bgep)
{
ddi_iblock_cookie_t iblk;
/* Only register with IO Fault Services if we have some capability */
if (bgep->fm_capabilities) {
bge_reg_accattr.devacc_attr_access = DDI_FLAGERR_ACC;
bge_desc_accattr.devacc_attr_access = DDI_FLAGERR_ACC;
dma_attr.dma_attr_flags = DDI_DMA_FLAGERR;
/*
* Register capabilities with IO Fault Services
*/
ddi_fm_init(bgep->devinfo, &bgep->fm_capabilities, &iblk);
/*
* Initialize pci ereport capabilities if ereport capable
*/
if (DDI_FM_EREPORT_CAP(bgep->fm_capabilities) ||
DDI_FM_ERRCB_CAP(bgep->fm_capabilities))
pci_ereport_setup(bgep->devinfo);
/*
* Register error callback if error callback capable
*/
if (DDI_FM_ERRCB_CAP(bgep->fm_capabilities))
ddi_fm_handler_register(bgep->devinfo,
bge_fm_error_cb, (void*) bgep);
} else {
/*
* These fields have to be cleared of FMA if there are no
* FMA capabilities at runtime.
*/
bge_reg_accattr.devacc_attr_access = DDI_DEFAULT_ACC;
bge_desc_accattr.devacc_attr_access = DDI_DEFAULT_ACC;
dma_attr.dma_attr_flags = 0;
}
}
{code}
{anchor:gemhm}
h4. Cleaning Up Fault Management Resources
The [{{ddi_fm_fini}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-fini-9f?a=view] function cleans up resources allocated to support fault management for _dip_.
{code}
void ddi_fm_fini(dev_info_t *_dip_)
{code}
The {{ddi_fm_fini()}} function can be called from kernel context in a driver [{{attach}}()|http://docs.sun.com/doc/819-2255/attach-9e?a=view] or [{{detach}}()|http://docs.sun.com/doc/819-2255/detach-9e?a=view] entry point.
The following example from the {{bge}} driver shows the {{bge_fm_fini()}} function, which calls the {{ddi_fm_fini}}(9F) function. The {{bge_fm_fini()}} function is called in the {{bge_unattach()}} function, which is called in both the {{bge_attach()}} and {{bge_detach()}} functions.
{code}
static void
bge_fm_fini(bge_t *bgep)
{
/* Only unregister FMA capabilities if we registered some */
if (bgep->fm_capabilities) {
/*
* Release any resources allocated by pci_ereport_setup()
*/
if (DDI_FM_EREPORT_CAP(bgep->fm_capabilities) ||
DDI_FM_ERRCB_CAP(bgep->fm_capabilities))
pci_ereport_teardown(bgep->devinfo);
/*
* Un-register error callback if error callback capable
*/
if (DDI_FM_ERRCB_CAP(bgep->fm_capabilities))
ddi_fm_handler_unregister(bgep->devinfo);
/*
* Unregister from IO Fault Services
*/
ddi_fm_fini(bgep->devinfo);
}
}
{code}
{anchor:gemgx}
h4. Getting the Fault Management Capability Bit Mask
The [{{ddi_fm_capable}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-capable-9f?a=view] function returns the capability bit mask currently set for _dip_.
{code}
void ddi_fm_capable(dev_info_t *_dip_)
{code}{anchor:gemfl}
h4. Reporting Errors
This section provides information about the following topics:
* [Queueing an Error Event|#gemfu] discusses how to queue error events.
* [Detecting and Reporting PCI-Related Errors|#gemfk] describes how to report PCI-related errors.
* [Reporting Standard I/O Controller Errors|#gemha] describes how to report standard I/O controller errors.
* [Service Impact Function|#gemgp] discusses how to report whether an error has
impacted the services provided by a device.
{anchor:gemfu}
h5. Queueing an Error Event
The [{{ddi_fm_ereport_post}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-ereport-post-9f?a=view] function causes an ereport event to be queued for delivery to the fault manager daemon, [{{fmd}}(1M)|http://docs.sun.com/doc/819-2240/fmd-1m?a=view].
{code}
void ddi_fm_ereport_post(dev_info_t *_dip_,
const char *_error_class_,
uint64_t _ena_,
int _sflag_, ...)
{code}
The _sflag_ parameter indicates whether the caller is willing to wait for system memory and event channel resources to become available.
The ENA indicates the *Error Numeric Association* (ENA) for this error report. The ENA might have been initialized and obtained from another error detecting software module such as a bus nexus driver. If the ENA is set to 0, it will be initialized by {{ddi_fm_ereport_post()}}.
The name-value pair (_nvpair_) variable argument list contains one or more name, type, value pointer _nvpair_ tuples for non-array {{data_type_t}} types or one
or more name, type, number of element, value pointer tuples for {{data_type_t}}
array types. The _nvpair_ tuples make up the ereport event payload required for
diagnosis. The end of the argument list is specified by {{NULL}}.
The ereport class names and payloads described in [Reporting Standard I/O Controller Errors|#gemha] for I/O controllers are used as appropriate for _error_class_. Other ereport class names and payloads can be defined, but they must be registered in the Sun *event registry* and accompanied by driver specific diagnosis engine software, or the Eversholt fault tree (eft) rules. For more information about the Sun event registry and about Eversholt fault tree rules, see the [Fault Management community|http://www.opensolaris.org/os/community/fm/] on the [OpenSolaris project|http://www.opensolaris.org/os/].
{code}
void
bge_fm_ereport(bge_t *bgep, char *detail)
{
uint64_t ena;
char buf[FM_MAX_CLASS];
(void) snprintf(buf, FM_MAX_CLASS, "%s.%s", DDI_FM_DEVICE, detail);
ena = fm_ena_generate(0, FM_ENA_FMT1);
if (DDI_FM_EREPORT_CAP(bgep->fm_capabilities)) {
ddi_fm_ereport_post(bgep->devinfo, buf, ena, DDI_NOSLEEP,
FM_VERSION, DATA_TYPE_UINT8, FM_EREPORT_VERS0, NULL);
}
}
{code}
{anchor:gemfk}
h5. Detecting and Reporting PCI-Related Errors
PCI-related errors, including PCI, PCI-X, and PCI-E, are automatically detected
and reported when you use [{{pci_ereport_post}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/pci-ereport-post-9f?a=view].
{code}
void pci_ereport_post(dev_info_t *_dip_, ddi_fm_error_t *_derr_, uint16_t *_xx_status_)
{code}
Drivers do not need to generate driver-specific ereports for errors that occur in the PCI Local Bus configuration status registers. The {{pci_ereport_post()}} function can report data parity errors, master aborts, target aborts, signaled system errors, and much more.
If {{pci_ereport_post()}} is to be used by a driver, then [{{pci_ereport_setup}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/pci-ereport-setup-9f?a=view] must have been previously called during the driver's {{attach}}(9E) routine, and [{{pci_ereport_teardown}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/pci-ereport-teardown-9f?a=view] must subsequently be called during the driver's {{detach}}(9E) routine.
The {{bge}} code samples below show the {{bge}} driver invoking the {{pci_ereport_post()}} function from the driver's error handler. See also [Registering an Error Handler|#gemie].
{code}
/*
* The I/O fault service error handling callback function
*/
/*ARGSUSED*/
static int
bge_fm_error_cb(dev_info_t *dip, ddi_fm_error_t *err, const void *impl_data)
{
/*
* as the driver can always deal with an error
* in any dma or access handle, we can just return
* the fme_status value.
*/
pci_ereport_post(dip, err, NULL);
return (err->fme_status);
}
{code}
{anchor:gemha}
h5. Reporting Standard I/O Controller Errors
A standard set of device ereports is defined for commonly seen errors for I/O controllers. These ereports should be generated whenever one of the error symptoms described in this section is detected.
The ereports described in this section are dispatched for diagnosis to the eft diagnosis engine, which uses a common set of standard rules to diagnose them. Any other errors detected by device drivers must be defined as ereport events in the Sun event registry and must be accompanied by device specific diagnosis software or eft rules.
h6. DDI_FM_DEVICE_INVAL_STATE
The driver has detected that the device is in an invalid state.
A driver should post an error when it detects that the data it transmits or receives appear to be invalid. For example, in the {{bge}} code, the {{bge_chip_reset()}} and {{bge_receive_ring()}} routines generate the {{ereport.io.device.inval_state}} error when these routines detect invalid data.
{code}
/*
* The SEND INDEX registers should be reset to zero by the
* global chip reset; if they're not, there'll be trouble
* later on.
*/
sx0 = bge_reg_get32(bgep, NIC_DIAG_SEND_INDEX_REG(0));
if (sx0 != 0) {
BGE_REPORT((bgep, "SEND INDEX - device didn't RESET"));
bge_fm_ereport(bgep, DDI_FM_DEVICE_INVAL_STATE);
return (DDI_FAILURE);
}
/* ... */
/*
* Sync (all) the receive ring descriptors
* before accepting the packets they describe
*/
DMA_SYNC(rrp->desc, DDI_DMA_SYNC_FORKERNEL);
if (*rrp->prod_index_p >= rrp->desc.nslots) {
bgep->bge_chip_state = BGE_CHIP_ERROR;
bge_fm_ereport(bgep, DDI_FM_DEVICE_INVAL_STATE);
return (NULL);
}
{code}
h6. DDI_FM_DEVICE_INTERN_CORR
The device has reported a self-corrected internal error. For example, a correctable ECC error has been detected by the hardware in an internal buffer within the device.
This error flag is not used in the {{bge}} driver. See the {{nxge_fm.c}} file on OpenSolaris for examples that use this error. Take the following steps to study the {{nxge}} driver code:** Go to [http://www.opensolaris.org/os/].** Click [http://cvs.opensolaris.org/source/] under the Code heading in the menu on the left side of the page.** Enter {{nxge}} in the File Path field.** Click the Search button.
h6. DDI_FM_DEVICE_INTERN_UNCORR
The device has reported an uncorrectable internal error. For example, an uncorrectable ECC error has been detected by the hardware in an internal buffer within
the device.
This error flag is not used in the {{bge}} driver. See the {{nxge_fm.c}} file on OpenSolaris for examples that use this error.
h6. DDI_FM_DEVICE_STALL
The driver has detected that data transfer has stalled unexpectedly.
The {{bge_factotum_stall_check()}} routine provides an example of stall detection.
{code}
dogval = bge_atomic_shl32(&bgep->watchdog, 1);
if (dogval < bge_watchdog_count)
return (B_FALSE);
BGE_REPORT((bgep, "Tx stall detected,
watchdog code 0x%x", dogval));
bge_fm_ereport(bgep, DDI_FM_DEVICE_STALL);
return (B_TRUE);
{code}
h6. DDI_FM_DEVICE_NO_RESPONSE
The device is not responding to a driver command.
{code}
bge_chip_poll_engine(bge_t *bgep, bge_regno_t regno,
uint32_t mask, uint32_t val)
{
uint32_t regval;
uint32_t n;
for (n = 200; n; --n) {
regval = bge_reg_get32(bgep, regno);
if ((regval & mask) == val)
return (B_TRUE);
drv_usecwait(100);
}
bge_fm_ereport(bgep, DDI_FM_DEVICE_NO_RESPONSE);
return (B_FALSE);
}
{code}
h6. DDI_FM_DEVICE_BADINT_LIMIT
The device has raised too many consecutive invalid interrupts.
The {{bge_intr()}} routine within the {{bge}} driver provides an example of stuck interrupt detection. The {{bge_fm_ereport()}} function is a wrapper for the {{ddi_fm_ereport_post}}(9F) function. See the {{bge_fm_ereport()}} example in [Queueing an Error Event|#gemfu].
{code}
if (bgep->missed_dmas >= bge_dma_miss_limit) {
/*
* If this happens multiple times in a row,
* it means DMA is just not working. Maybe
* the chip has failed, or maybe there's a
* problem on the PCI bus or in the host-PCI
* bridge (Tomatillo).
*
* At all events, we want to stop further
* interrupts and let the recovery code take
* over to see whether anything can be done
* about it ...
*/
bge_fm_ereport(bgep,
DDI_FM_DEVICE_BADINT_LIMIT);
goto chip_stop;
}
{code}{anchor:gemgp}
h5. Service Impact Function
A fault management capable driver must indicate whether or not an error has impacted the services provided by a device. Following detection of an error and, if necessary, a shutdown of services, the driver should invoke the [{{ddi_fm_service_impact}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-service-impact-9f?a=view] routine to reflect the current service state of the device instance. The service state can be used by diagnosis and recovery software to help identify or react to the problem.
The {{ddi_fm_service_impact()}} routine should be called both when an error has
been detected by the driver itself, and when the framework has detected an error and marked an access or DMA handle as faulty.
{code}
void ddi_fm_service_impact(dev_info_t *_dip_, int _svc_impact_)
{code}
The following service impact values (_svc_impact_) are accepted by {{ddi_fm_service_impact()}}:
h6. DDI_SERVICE_LOST
The service provided by the device is unavailable due to a device fault or software defect.
h6. DDI_SERVICE_DEGRADED
The driver is unable to provide normal service, but the driver can provide a partial or degraded level of service. For example, the driver might have to make repeated attempts to perform an operation before it succeeds, or it might be running at less that its configured speed.
h6. DDI_SERVICE_UNAFFECTED
The driver has detected an error, but the services provided by the device instance are unaffected.
h6. DDI_SERVICE_RESTORED
All of the device's services have been restored.
The call to {{ddi_fm_service_impact()}} generates the following ereports on behalf of the driver, based on the service impact argument to the service impact routine:
* {{ereport.io.service.lost}}
* {{ereport.io.service.degraded}}
* {{ereport.io.service.unaffected}}
* {{ereport.io.service.restored}}
In the following {{bge}} code, the driver determines that it is unable to successfully restart transmitting or receiving packets as the result of an error. The service state of the device transitions to DDI_SERVICE_LOST.
{code}
/*
* All OK, reinitialize hardware and kick off GLD scheduling
*/
mutex_enter(bgep->genlock);
if (bge_restart(bgep, B_TRUE) != DDI_SUCCESS) {
(void) bge_check_acc_handle(bgep, bgep->cfg_handle);
(void) bge_check_acc_handle(bgep, bgep->io_handle);
ddi_fm_service_impact(bgep->devinfo, DDI_SERVICE_LOST);
mutex_exit(bgep->genlock);
return (DDI_FAILURE);
}
{code}
{info:title=Note}The {{ddi_fm_service_impact()}} function should not be called from the registered callback routine.{info}
{anchor:gemhz}
h4. Access Attributes Structure
A {{DDI_FM_ACCCHK_CAPABLE}} device driver must set its access attributes to indicate that it is capable of handling programmed I/O (PIO) access errors that occur during a register read or write. The {{devacc_attr_access}} field in the [{{ddi_device_acc_attr}}(9S)|http://docs.sun.com/doc/819-2257/ddi-device-acc-attr-9s?a=view] structure should be set as an indicator to the system that the driver is capable of checking for and handling data path errors. The {{ddi_device_acc_attr}} structure contains the following members:
{code}
ushort_t devacc_attr_version;
uchar_t devacc_attr_endian_flags;
uchar_t devacc_attr_dataorder;
uchar_t devacc_attr_access; /* access error protection */
{code}
Errors detected in the data path to or from a device can be processed by one or
more of the device driver's nexus parents.
The {{devacc_attr_access}} field can be set to the following values:
h6. DDI_DEFAULT_ACC
This flag indicates the system will take the default action (panic if appropriate) when an error occurs. This attribute cannot be used by DDI_FM_ACCCHK_CAPABLE
drivers.
h6. DDI_FLAGERR_ACC
This flag indicates that the system will attempt to handle and recover from an error associated with the access handle. The driver should use the techniques described in [Defensive Programming Techniques for Solaris Device Drivers|Hardening-DefProg] and should use [{{ddi_fm_acc_err_get}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-acc-err-get-9f?a=view] to regularly check for errors before the driver allows data to be passed back to the calling application.
The DDI_FLAGERR_ACC flag provides:
* Error notification via the driver callback
* An error condition observable via {{ddi_fm_acc_err_get}}(9F)
h6. DDI_CAUTIOUS_ACC
The DDI_CAUTIOUS_ACC flag provides a high level of protection for each Programmed I/O access made by the driver.
{info:title=Note}Use of this flag will cause a significant impact on the performance of the driver.{info}
The DDI_CAUTIOUS_ACC flag signifies that an error is anticipated by the accessing driver. The system attempts to handle and recover from an error associated with this handle as gracefully as possible. No error reports are generated as a result, but the handle's {{fme_status}} flag is set to DDI_FM_NONFATAL. This flag is functionally equivalent to [{{ddi_peek}}(9F)|http://docs.sun.com/doc/819-2256/ddi-peek-9f?a=view] and [{{ddi_poke}}(9F)|http://docs.sun.com/doc/819-2256/ddi-poke-9f?a=view].
The use of the DDI_CAUTIOUS_ACC provides:
* Exclusive access to the bus
* On trap protection - ({{ddi_peek()}} and {{ddi_poke()}})
* Error notification through the driver callback registered with [{{ddi_fm_handler_register}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-handler-register-9f?a=view]
* An error condition observable through {{ddi_fm_acc_err_get}}(9F)
Generally, drivers should check for data path errors at appropriate junctures in the code path to guarantee consistent data and to ensure that proper error status is presented in the I/O software stack.
DDI_FM_ACCCHK_CAPABLE device drivers must set their {{devacc_attr_access}} field to DDI_FLAGERR_ACC or DDI_CAUTIOUS_ACC.
{anchor:gemhh}
h4. DMA Attributes Structure
As with access handle setup, a DDI_FM_DMACHK_CAPABLE device driver must set the
{{dma_attr_flag}} field of its [{{ddi_dma_attr}}(9S)|http://docs.sun.com/doc/819-2257/ddi-dma-attr-9s?a=view] structure to the DDI_DMA_FLAGERR flag. The system attempts to recover from an error associated with a handle that has DDI_DMA_FLAGERR set. The {{ddi_dma_attr}} structure contains the following members:
{code}
uint_t dma_attr_version; /* version number */
uint64_t dma_attr_addr_lo; /* low DMA address range */
uint64_t dma_attr_addr_hi; /* high DMA address range */
uint64_t dma_attr_count_max; /* DMA counter register */
uint64_t dma_attr_align; /* DMA address alignment */
uint_t dma_attr_burstsizes; /* DMA burstsizes */
uint32_t dma_attr_minxfer; /* min effective DMA size */
uint64_t dma_attr_maxxfer; /* max DMA xfer size */
uint64_t dma_attr_seg; /* segment boundary */
int dma_attr_sgllen; /* s/g length */
uint32_t dma_attr_granular; /* granularity of device */
uint_t dma_attr_flags; /* Bus specific DMA flags */
{code}
Drivers that set the DDI_DMA_FLAGERR flag should use the techniques described in [Defensive Programming Techniques for Solaris Device Drivers|Hardening-DefProg] and should use [{{ddi_fm_dma_err_get}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-dma-err-get-9f?a=view] to check for data path errors whenever DMA transactions are completed or at significant points within the code path. This ensures consistent data and proper error status presented to the I/O software stack.
Use of DDI_DMA_FLAGERR provides:
* Error notification via the driver callback registered with {{ddi_fm_handler_register()}}
* An error condition observable by calling {{ddi_fm_dma_err_get()}}
{anchor:gemfy}
h4. Getting Error Status
If a fault has occurred that affects the resource mapped by the handle, the error status structure is updated to reflect error information captured during error handling by a bus or other device driver in the I/O data path.
{code}
void ddi_fm_dma_err_get(ddi_dma_handle_t handle, ddi_fm_error_t *de, int version)
void ddi_fm_acc_err_get(ddi_acc_handle_t handle, ddi_fm_error_t *de, int version)
{code}
The {{ddi_fm_dma_err_get}}(9F)and {{ddi_fm_acc_err_get}}(9F) functions return the error status for a DMA or access handle respectively. The version field should be set to DDI_FME_VERSION.
An error for an access handle means that an error has been detected that has affected PIO transactions to or from the device using that access handle. Any data received by the driver, for example via a recent [{{ddi_get8}}(9F)|http://docs.sun.com/doc/819-2256/ddi-get8-9f?a=view] call, should be considered potentially corrupt. Any data sent to the device, for example via a recent [{{ddi_put32}}(9F)|http://docs.sun.com/doc/819-2256/ddi-put32-9f?a=view] call might also have been corrupted or might not have been received at all. The underlying fault might, however, be transient, and the driver can therefore attempt to recover by calling [{{ddi_fm_acc_err_clear}}(9F)|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-dma-err-clear-9f?a=view], resetting the device to get it back into a known state, and retrying any potentially failed transactions.
If an error is indicated for a DMA handle, it implies that an error has been detected that has (or will) affect DMA transactions between the device and the memory currently bound to the handle (or most recently bound, if the handle is currently unbound). Possible causes include the failure of a component in the DMA data path, or an attempt by the device to make an invalid DMA access. The driver might be able to continue by retrying and reallocating memory. The contents of the memory currently (or previously) bound to the handle should be regarded as indeterminate and should be released back to the system. The fault indication associated with the current transaction is lost once the handle is bound or re-bound, but because the fault might persist, future DMA operations might not succeed.
{anchor:gemfr}
h4. Clearing Errors
These routines should be called when the driver wants to retry a request after an error was detected by the handle without needing to free and reallocate the handle first.
{code}
void ddi_fm_acc_err_clear(ddi_acc_handle_t handle, int version)
void ddi_fm_dma_err_clear(ddi_dma_handle_t handle, int version)
{code}
{anchor:gemie}
h4. Registering an Error Handler
Error handling activity might begin at the time that the error is detected by the operating system via a trap or error interrupt. If the software responsible for handling the error (the error handler) cannot immediately isolate the device that was involved in the failed I/O operation, it must attempt to find a software module within the device tree that can perform the error isolation. The Solaris device tree provides a structural means to propagate nexus driver error handling activities to children who might have a more detailed understanding of the error and can capture error state and isolate the problem device.
A driver can register an error handler callback with the I/O Fault Services Framework. The error handler should be specific to the type of error and subsystem where error detection has occurred. When the driver's error handler routine is invoked, the driver must check for any outstanding errors associated with device transactions and generate ereport events. The driver must also return error handler status in its [{{ddi_fm_error}}(9S)|http://docs.sun.com/app/docs/doc/819-2257/ddi-fm-error-9s?a=view] structure. For example, if it has been determined that the system's integrity has been compromised, the most appropriate action might be for the error handler to panic the system.
The callback is invoked by a parent nexus driver when an error might be associated with a particular device instance. Device drivers that register error handlers must be DDI_FM_ERRCB_CAPABLE.
{code}
void ddi_fm_handler_register(dev_info_t *_dip_, ddi_err_func_t _handler_, void *_impl_data_)
{code}
The {{ddi_fm_handler_register}}(9F) routine registers an error handler callback
with the I/O fault services framework. The {{ddi_fm_handler_register()}} function should be called in the driver's [{{attach}}(9E)|http://docs.sun.com/doc/819-2255/attach-9e?a=view] entry point for callback registration following driver fault management initialization ({{ddi_fm_init()}}).
The error handler callback function must do the following:
* Check for any outstanding hardware errors associated with device transactions, and generate ereport events for diagnosis. For a PCI, PCI-x, or PCI express device this can generally be done using {{pci_ereport_post()}} as described in [Detecting and Reporting PCI-Related Errors|#gemfk].
* Return error handler status in its {{ddi_fm_error}} structure:
** DDI_FM_OK
** DDI_FM_FATAL
** DDI_FM_NONFATAL
** DDI_FM_UNKNOWN
Driver error handlers receive the following:
* A pointer to a device instance (_dip_) under the driver's control
* A data structure ({{ddi_fm_error}}) that contains common fault management data and status for error handling
* A pointer to any implementation specific data (_impl_data_) specified at the time of the handler's registration
The [{{ddi_fm_handler_register(9F)}}|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-handler-register-9f?a=view] and [{{ddi_fm_handler_unregister(9F)}}|http://docs.sun.com/app/docs/doc/819-2256/ddi-fm-handler-unregister-9f?a=view] routines must be called from kernel context in a driver's {{attach}}(9E) or {{detach}}(9E) entry point. The registered error handler callback can be called from kernel, interrupt, or high-level interrupt context. Therefore the error handler:
* Must not hold locks
* Must not sleep waiting for resources
A device driver is responsible for:
* Isolating the device instance that might have caused errors
* Recovering transactions associated with errors
* Reporting the service impact of errors
* Scheduling device shutdown for errors considered fatal
These actions can be carried out within the error handler function. However, because of the restrictions on locking and because the error handler function does not always know the context of what the driver was doing at the point where the fault occurred, it is more usual for these actions to be carried out following inline calls to {{ddi_fm_acc_err_get}}(9F) and {{ddi_fm_dma_err_get}}(9F) within the normal paths of the driver as described previously.
{code}
/*
* The I/O fault service error handling callback function
*/
/*ARGSUSED*/
static int
bge_fm_error_cb(dev_info_t *dip, ddi_fm_error_t *err, const void *impl_data)
{
/*
* as the driver can always deal with an error
* in any dma or access handle, we can just return
* the fme_status value.
*/
pci_ereport_post(dip, err, NULL);
return (err->fme_status);
}
{code}
{anchor:gemhd}
h4. Fault Management Data and Status Structure
Driver error handling callbacks are passed a pointer to a data structure that contains common fault management data and status for error handling.
The data structure {{ddi_fm_error}} contains an FMA protocol ENA for the current error, the status of the error handler callback, an error expectation flag, and any potential access or DMA handles associated with an error detected by the parent nexus.
h6. {{fme_ena}}
This field is initialized by the calling parent nexus and might have been incremented along the error handling propagation chain before reaching the driver's registered callback routine. If the driver detects a related error of its own, it should increment this ENA prior to calling {{ddi_fm_ereport_post()}}.
h6. {{fme_acc_handle}}, {{fme_dma_handle}}
These fields contain a valid access or DMA handle if the parent was able to associate an error detected at its level to a handle mapped or bound by the device driver.
h6. {{fme_flag}}
The {{fme_flag}} is set to DDI_FM_ERR_EXPECTED if the calling parent determines
the error was the result of a DDI_CAUTIOUS_ACC protected operation. In this case, the {{fme_acc_handle}} is valid and the driver should check for and report only errors not associated with the DDI_CAUTIOUS_ACC protected operation. Otherwise, {{fme_flag}} is set to DDI_FM_ERR_UNEXPECTED and the driver must perform the full range of error handling tasks.
h6. {{fme_status}}
Upon return from its error handler callback, the driver must set {{fme_status}}
to one of the following values:
* DDI_FM_OK – No errors were detected and the operational state of this device instance remains the same.
* DDI_FM_FATAL – An error has occurred and the driver considers it to be
fatal to the system. For example, a call to {{pci_ereport_post}}(9F) might have
detected a system fatal error. In this case, the driver should report any additional error information it might have in the context of the driver.
* DDI_FM_NONFATAL – An error has been detected by the driver but is not considered fatal to the system. The driver has identified the error and has either isolated the error or is committing that it will isolate the error.
* DDI_FM_UNKNOWN – An error has been detected, but the driver is unable to isolate the device or determine the impact of the error on the operational state of the system.
{anchor:gemfs}
h3. Diagnosing Faults
The fault management daemon, [{{fmd}}(1M)|http://docs.sun.com/doc/819-2240/fmd-1m?a=view], provides a programming interface for the development of diagnosis engine (DE) plug-in modules. A DE can be written to consume and diagnose any error telemetry or specific error telemetries. The eft DE was designed to diagnose any number of ereport classes based on diagnosis rules specified in the Eversholt language.
{anchor:gemge}
h4. Standard Leaf Device Diagnosis
Most I/O subsystems use the eft DE and rules sets to diagnose device and device
driver related problems. A standard set of ereports, listed in [Reporting Standard I/O Controller Errors|#gemha], has been specified for PCI leaf devices. Accompanying these ereports are eft diagnosis rules that take the telemetry and identify the associated device fault. Drivers that generate these ereports do not need to deliver any additional diagnosis software or eft rules.
The detection and generation of these ereports produces the following fault events:
h6. {{fault.io.pci.bus-linkerr}}
A hardware fault on the PCI bus
h6. {{fault.io.pci.device-interr}}
A hardware fault within the device
h6. {{fault.io.pci.device-invreq}}
A hardware fault in the device or a defect in the driver that causes the device
to send an invalid request
h6. {{fault.io.pci.device-noresp}}
A hardware fault in the device that causes the driver not to respond to a valid
request
h6. {{fault.io.pciex.bus-linkerr}}
A hardware fault on the link
h6. {{fault.io.pciex.bus-noresp}}
The link going down so that a device cannot respond to a valid request
h6. {{fault.io.pciex.device-interr}}
A hardware fault within the device
h6. {{fault.io.pciex.device-invreq}}
A hardware fault in the device or a defect in the driver that causes the device
to send an invalid request
h6. {{fault.io.pciex.device-noresp}}
A hardware fault in the device causing it not to respond to a valid request
{anchor:gemia}
h4. Specialized Device Diagnosis
Driver developers who want to generate additional ereports or provide more specialized diagnosis software or eft rules can do so by writing a C-based DE or an eft diagnosis rules set. See the [Fault Management community|http://www.opensolaris.org/os/community/fm/] on the [OpenSolaris project|http://www.opensolaris.org/os/] for information.
{anchor:gemhe}
h3. Event Registry
The Sun event registry is the central repository of all class names, ereports, faults, defects, upsets and suspect lists (list.suspect) events. The event registry also contains the current definitions of all event member payloads, as well as important non-payload information like internal documentation, suspect lists, dictionaries, and knowledge articles. For example, {{ereport.io}} and {{fault.io}} are two of the base class names that are of particular importance to I/O driver developers.
The FMA event protocol defines a base set of payload members that is supplied with each of the registered events. Developers can also define additional events that help diagnosis engines (or eft rules) to narrow a suspect list down to a specific fault.
{anchor:gemgu}
h3. Glossary
This section uses the following terms:
h6. Agent
A generic term used to describe fault manager modules that subscribe to fault.*
or list.* events. Agents are used to retire faulty resources, communicate diagnosis results to Administrators, and bridge to higher-level management frameworks.
h6. ASRU (Automated System Reconfiguration Unit)
The ASRU is a resource that can be disabled by software or hardware in order to
isolate a problem in the system and suppress further error reports.
h6. DE (Diagnosis Engine)
A fault management module whose purpose is to diagnose problems by subscribing to one or more classes of incoming error events and using these events to solve cases associated with each problem on the system.
h6. ENA (Error Numeric Association)
An Error Numeric Association (ENA) is an encoded integer that uniquely identifies an error report within a given fault region and time period. The ENA also indicates the relationship of the error to previous errors as a secondary effect.
h6. Error
An unexpected condition, result, signal, or datum. An error is the symptom of a
problem on the system. Each problem typically produces many different kinds of errors.
h6. ereport (Error Report)
The data captured with a particular error. Error report formats are defined in advance by creating a class naming the error report and defining a schema using the Sun event registry.
h6. ereport event (Error Event)
The data structure that represents an instance of an error report. Error events
are represented as name-value pair lists.
h6. Fault
Malfunctioning behavior of a hardware component.
h6. Fault Boundary
Logical partition of hardware or software elements for which a specific set of faults can be enumerated.
h6. Fault Event
An instance of a fault diagnosis encoded in the protocol.
h6. Fault Manager
Software component responsible for fault diagnosis via one or more diagnosis engines and state management.
h6. FMRI (Fault Managed Resource Identifier)
An FMRI is a URL-like identifier that acts as the canonical name for a particular resource in the fault management system. Each FMRI includes a scheme that identifies the type of resource, and one or more values that are specific to the scheme. An FMRI can be represented as URL-like string or as a name-value pair list data structure.
h6. FRU (Field Replaceable Unit)
The FRU is a resource that can be replaced in the field by a customer or service provider. FRUs can be defined for hardware (for example system boards) or for software (for example software packages or patches).
{anchor:gemhq}
h3. Resources
The following resources provide additional information:
* [Fault Management OpenSolaris community|http://www.opensolaris.org/os/community/fm/]
* [FMA Messaging web site|http://www.sun.com/msg/]
----
Next: [Defensive Programming Techniques for Solaris Device Drivers|Hardening-DefProg] > [Driver Hardening Test Harness|Hardening-BOFI]