Fault Management

Version 4 by Alta
on Nov 25, 2008 17:32.

compared with
Current by Alta
on Nov 25, 2008 18:00.

Key
This line was removed.
This word was removed. This word was added.
This line was added.

Changes (11)

View page history
h1.Fault Management for System Administrators
* [Solaris Fault Management Overview|#overview]
* [Fault Notification|#notification]
* [Displaying Faults|#DisplayingFaults]
** [fmadm faulty|#faulty]
** [fmdump|#dump]
* [Repairing Faults|#repair]
** [fmadm repaired|#repaired]
** [fmadm replaced|#replaced]
** [fmadm acquit|#acquit]
* [Fault Management Administration Example|#example]
* [Fault Management Log Files|#logfiles]
* [Fault Statistics|#statistics]

{anchor:overview}
h2.Solaris Fault Management Overview
{code}
{anchor:notification}
h2.Fault Notification
The preferred method to display fault information and determine the FRUs involved is the {{fmadm faulty}} command. However, the {{fmdump}} command is still supported.
{anchor:faulty}
h3.fmadm faulty
The {{fmadm faulty}} command is used to display any faulty components in the system, as shown in the following example:
{code}
{anchor:dump}
h3.fmdump
As mentioned above, some console messages and knowledge articles might instruct you to use the older {{fmdump -v -u UUID}} command to display fault information. While the {{fmadm faulty}} command is preferred, the {{fmdump}} command still operates, as shown in the following example:
The information about the impacted FRUs is still present, although separated across two lines (lines 8 and 9). The Location string presents the human-readable FRU string, and the FRU line presents the formal FMRI. Note that the severity, descriptive text, and action are not shown with the {{fmdump}}(1M) command.
{anchor:repair}
h2.Repairing Faults
Once Fault Management has faulted a component in your system, you will want to repair it. A repair can happen in one of two ways: implicitly or explicitly.
{info:title=Note}Although these four commands can take FMRIs and UUIDs as arguments, the preferred argument to use is the Label. If a FRU has multiple faults against it, you want to replace the FRU only one time. If you issue the {{fmadm replaced}} command against the Label, the FRU is reflected as such in any outstanding cases.{info}
{anchor:repaired}
h3.fmadm repaired
The {{fmadm repaired}} command should be used when some physical repair has been carried out that might resolve the problem. Examples of such repairs include reseating a card or straightening a bent pin.
{anchor:replaced}
h3.fmadm replaced
The {{fmadm replaced}} command should be used to indicate that the suspect FRU has been replaced.
If the system automatically discovers that a FRU has been removed but not replaced, then the current behaviour is unchanged: The suspect is displayed as "not present", but is not considered to be permanently removed until the {{rsrc.aged}} time has expired.
{anchor:acquit}
h3.fmadm acquit
Replacement takes precedence over repair and both replacement and repair take precedence over acquittal. Thus, you can acquit something and then subsequently repair it, but you cannot acquit something that has already been repaired.
{code}fmadm acquit uuid [ fmri | label ]{code}
{anchor:example}
h3.Fault Management Administration Example
The most common uses of these four commands are expected to be the following three uses:
{code}
{anchor:logfiles}
h2.Fault Management Log Files
An overview of the FMA log files is here:
* [Managing Fault Management Log Files|http://blogs.sun.com/sdaven/entry/fma_log_files], by Scott Davenport
{anchor:statistics}
h2.Fault Statistics
The Fault Manager daemon, {{fmd}}(1M), and many of its modules track statistics. The [{{fmstat}}(1M)|http://docs.sun.com/app/docs/doc/819-2240/fmstat-1m?a=view] command reports those statistics. Without options, {{fmstat}} gives a high-level overview of the events, processing times, and memory usage of the loaded modules. For example:

The individuals who post here are part of the extended Sun Microsystems community and they might not be employed or in any way formally affiliated with Sun Microsystems. The opinions expressed here are their own, are not necessarily reviewed in advance by anyone but the individual authors, and neither Sun nor any other party necessarily agrees with them.

Copyright 1994-2009 Sun Microsystems, Inc.
Powered by Atlassian Confluence
Sun Guidelines on Public Discourse Privacy Policy Terms of Use Trademarks Site Map Employment Investor Relations Contact