[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: tracing mysterious rebooting on centos 5
What are you using auditd to monitor ? Certain kernels have had 'issues' or problems in conjunction with an older version of auditd .
We found auditd to be to resource intensive and problematic to run on any customers production servers.
Having had at least 2 direct experiences with auditd causing a kernel panic, we don't recommend our customers run it at all.
---- Brandon Joseph Adams <firstname.lastname@example.org> wrote:
I actually found out that auditd won't catch reboot(2) very well. The reason
is that the detection of reboot() is kinda slow and the system doesn't seem
to remember the audit report when it restarts, but it's slow enough that it
doesn't log it out to disk either. Part of the reason it doesn't log it out to
disk is that on all reboots except the actual reboot command (or something
similar that does sync() then reboot() at a syscall level) go through init,
which in turn will shut off auditd, the part that does the actual logging.
I didn't have physical access to the machine and technically am not
responsible for the hardware, but I figured out, after some trial and error
that it appears to be one of the sysctl settings in the network stack I was
using a little too aggressively. Whether this is an issue in the tcp/ip stack
itself, sloppy configuration (although the docs appear to agree with me) on
my part, an issue with the kernel that is exposing a hardware problem, the
fix appears to be, at least for now, to scale back the settings. More careful
examination of dmesg proves that the kernel has been crashing (but since
it's a stock CentOS 5 install, all of the useful kernel debugging utils are
gone), not anything calling reboot() that I might catch with debugging tools
Thanks for the hardware suggestions though. I will suggest to the person
handling the hardware this stuff and maybe he'll come back and tell me it
wasn't my fault after all.
Brandon Joseph Adams
Tel: +1 217-953-0257
GPG Key: 1024D/2F2EFCCF
Fingerprint = 2AEE AA3F B5C9 409D 2EE6 E38B CA06 8CDF 2F2E FCCF
On Thursday 29 October 2009 11:51:10 am Robert G. (Doc) Savage wrote:
> On Thu, 2009-10-29 at 01:29 -0500, Brandon Joseph Adams wrote:
> > Hi
> > So I'm back doing linux stuff after a few years and I ran into a rather
> > signficant hangup. The system I'm working on is a CentOS 5.4 machine
> > keeps rebooting at semi-random intervals whenever the iptables rules
> > change. I'm not currently sure if this is a fault in my iptables rules, a
> > hardware problem, or the result of a rogue legacy script finding
> > something amiss and rebooting. The solution I came up with is as
> > (with auditd enabled):
> > auditctl -a entry,always -S reboot
> > My only concern is that there might be another way to reboot the
> > that isn't using reboot(2). init 6 seems to use reboot(2) and the
> > userspace reboot command seems to use it as well. Is there another
> > (outside of magic sysrq) that the system might be rebooting via
> > Also, is there a better (maybe more obvious) way to be checking for
> > The logs in /var/log aren't very helpful and last & friends just tell me
> > that a reboot happened, not what caused it.
> Welcome back to the "big leagues". I'm running RHEL5.4 Server
> (supplemented by some CentOS workstation packages) on a SuperMicro
> dual-socket Opteron server. I've never seen a spontaneous reboot
> like you describe, so it's probably not due to any underlying software
> bug or misconfiguration.
> What's your hardware? More specifically, how old is the power supply?
> The last time I had spontaneous reboots I traced it to broken-down
> electrolytic filter capacitors in the power supply that was allowing AC
> spiking onto the +5V and +12V power lines. Electrolytics are the big
> beer-can size capacitors filled with a moist paste (the electrolyte).
> With time and temperature, that paste eventually dries out and the
> capacitor loses its ability to filter AC voltages applied across its
> terminals. Any power supply older than 4-5 years should be considered
> It's easy enough to diagnose this. Take a volt-ohm meter (VOM) set to
> measure AC volts and measure between +5V or+12V leads and ground
> unused Molex power connector. It should read zero on all scale settings.
> If you have access to an oscilloscope, you should see no spikes riding
> on the DC voltages.
> To unsubscribe, send email to email@example.com with
> "unsubscribe silug-discuss" in the body.
To unsubscribe, send email to firstname.lastname@example.org with
"unsubscribe silug-discuss" in the body.