[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Tracking Down Intermittent Kernel Panics



Hi David,

  At my last job, we had a similar situation where we had RedHat servers whose hardware makeup was pretty generic and very similar. One thing that we always did before even building a new server, and to diagnose problems similar to the one you're experiencing is using a tool like "Tuff-Test-Pro". Tuff Test will let you create a bootable floppy disk, that runs a self-contained OS written is Assembler, that can diagnose all basic hardware componets(Memory, Hard Disk, Motherboard, CPU, Serial ports, parallel ports, video cards, etc, etc,) Tuff test pro can test memory up to 3.5 GIG, and processors anywhere from 8086 to a P4. It cost just $30 , I've seen similar but not as effective HW diagnostic tools that cost > $100.  You can run the HW certification tests in a loop continuously for your own "burn-in" tests on systems. I've used it on literally thousands of PC's in the last 5 years, and it's never let me down once. My guess from past experience would be, that you might have a!
  bad memory chip. On the RedHat side, first thing I'd do, is make sure all HW in that system is on the RedHat HCL. 

http://hardware.redhat.com/hcl/

Link for Tuff-test Pro 

http://www.tufftest.com/ttp01.htm

HTH

Roger Hill
A+, Network+, SCO CUSA, SCSA(solaris 8.0) 

============================================================
From: "David" <david@lolling.org>
Date: 2004/09/30 Thu AM 03:57:53 CDT
To: "Silug Discuss" <silug-discuss@silug.org>
Subject: Tracking Down Intermittent Kernel Panics

Hi,

I have a number of identical generic servers running Red Hat 7.3
in a server farm.  All of these servers are dedicated to running
the same application, which requires Red Hat 7.3.

One server is experiencing a kernel panic at what appears to be
random intervals.  At this point we feel the problem is hardware
related. The other servers have never had this problem.  We have
also tried Red Hat 9.0 on this machine in the past and it also
experienced kernel panics.

We have run serveral utilities to try to stress different components
trying to get another kernel panic so we can possibly pinpoint
the hardware problem.

So far we have run things like memtest and bonnie.

At this point we are at a loss as to how to track the problem
down and fix it.

Does anyone know of any other diagnostics or resources that would
help trace the kernel panics back to the source?

Thanks,

David


-
To unsubscribe, send email to majordomo@silug.org with
"unsubscribe silug-discuss" in the body.
============================================================



-
To unsubscribe, send email to majordomo@silug.org with
"unsubscribe silug-discuss" in the body.