How do I analyze a system crash dump?

How do I analyze a system crash dump? Added: 07/24/02


The tool to read them is called q4, although you can also get some info
from adb by going into the the directory and running:

# echo "msgbuf+8/s" | adb -m vmunix .

Don't forget the dot on the end :-)

If you really want to learn as much as possible for a customer about the
HP-UX kernel, and so gain some insight as to what's going on inside a
dump, then the best thing is to take the "Inside HP-UX" course from HP;
it's course number is H5081S.

The normal way to deal with crash dumps is to have an HP support
contract,
and then get their Responce Center to have a look at the dump, ascertain
if it's a known problem, and if so tell you which patch(es) you need to
apply.

Analyzing a crash dump takes a great deal of experience, knowledge and
access to the source code for HP-UX and a large data mine full of
similar crash dumps. The msgbuf will say something like:

data segmentation violation

which means: the kernel made a mistake and a pointer to an integer has an
odd address (must be an even address). However, this is of no use until
you try to unwind why the kernel made this mistake, look at the source
code to see if a fix has been made, and finally recommend a patch.

Here are a few reasons that you'll see in msgbuf (and also
/etc/shutdownlog):

freeing free frag
freeing free inode

This means severe filesystem corruption exists on the disk (not to be
confused with I/O errors unless the error is undetected).

HPMC

This is a hardware failure and a board will have to be replaced.
Determining which board is actually easier than finding a patch for a
software problem.

The vast majority of crash mistakes have been corrected through patches,
so you can save your self a lot of time by applying the complete set of
patch bundles from your SupportPlus CDROM. Any CD from the past year is a
good choice. Or you can download the latest SupportPlus patch bundles
from:

o <http://software.hp.com/SUPPORT_PLUS/>



Home
FAQ