[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tracing mysterious rebooting on centos 5



I actually found out that auditd won't catch reboot(2) very well. The reason 
is that the detection of reboot() is kinda slow and the system doesn't seem 
to remember the audit report when it restarts, but it's slow enough that it 
doesn't log it out to disk either. Part of the reason it doesn't log it out to 
disk is that on all reboots except the actual reboot command (or something 
similar that does sync() then reboot() at a syscall level) go through init, 
which in turn will shut off auditd, the part that does the actual logging. 

I didn't have physical access to the machine and technically am not 
responsible for the hardware, but I figured out, after some trial and error 
that it appears to be one of the sysctl settings in the network stack I was 
using a little too aggressively. Whether this is an issue in the tcp/ip stack 
itself, sloppy configuration (although the docs appear to agree with me) on 
my part, an issue with the kernel that is exposing a hardware problem, the 
fix appears to be, at least for now, to scale back the settings. More careful 
examination of dmesg proves that the kernel has been crashing (but since 
it's a stock CentOS 5 install, all of the useful kernel debugging utils are 
gone), not anything calling reboot() that I might catch with debugging tools 
anyway.

Thanks for the hardware suggestions though. I will suggest to the person 
handling the hardware this stuff and maybe he'll come back and tell me it 
wasn't my fault after all. 

Thanks,
-- 
Brandon Joseph Adams
Email: emidln@gmail.com
Tel: +1 217-953-0257
GPG Key: 1024D/2F2EFCCF
Fingerprint = 2AEE AA3F B5C9 409D 2EE6  E38B CA06 8CDF 2F2E FCCF

On Thursday 29 October 2009 11:51:10 am Robert G. (Doc) Savage wrote:
> On Thu, 2009-10-29 at 01:29 -0500, Brandon Joseph Adams wrote:
> > Hi
> >
> > So I'm back doing linux stuff after a few years and I ran into a rather
> > signficant hangup. The system I'm working on is a CentOS 5.4 machine
> > keeps rebooting at semi-random intervals whenever the iptables rules
> > change. I'm not currently sure if this is a fault in my iptables rules, a
> > hardware problem, or the result of a rogue legacy script finding
> > something amiss and rebooting. The solution I came up with is as 
follows
> > (with auditd enabled):
> >
> > auditctl -a entry,always -S reboot
> >
> > My only concern is that there might be another way to reboot the 
system
> > that isn't using reboot(2). init 6 seems to use reboot(2) and the
> > userspace reboot command seems to use it as well. Is there another 
way
> > (outside of magic sysrq) that the system might be rebooting via 
software?
> >
> > Also, is there a better (maybe more obvious) way to be checking for 
this.
> > The logs in /var/log aren't very helpful and last & friends just tell me
> > that a reboot happened, not what caused it.
> 
> Brandon,
> 
> Welcome back to the "big leagues". I'm running RHEL5.4 Server
> (supplemented by some CentOS workstation packages) on a SuperMicro
> dual-socket Opteron server. I've never seen a spontaneous reboot 
problem
> like you describe, so it's probably not due to any underlying software
> bug or misconfiguration.
> 
> What's your hardware? More specifically, how old is the power supply?
> The last time I had spontaneous reboots I traced it to broken-down
> electrolytic filter capacitors in the power supply that was allowing AC
> spiking onto the +5V and +12V power lines. Electrolytics are the big
> beer-can size capacitors filled with a moist paste (the electrolyte).
> With time and temperature, that paste eventually dries out and the
> capacitor loses its ability to filter AC voltages applied across its
> terminals. Any power supply older than 4-5 years should be considered
> suspect.
> 
> It's easy enough to diagnose this. Take a volt-ohm meter (VOM) set to
> measure AC volts and measure between +5V or+12V leads and ground 
on any
> unused Molex power connector. It should read zero on all scale settings.
> If you have access to an oscilloscope, you should see no spikes riding
> on the DC voltages.
> 
> --Doc
> 
> 
> -
> To unsubscribe, send email to majordomo@silug.org with
> "unsubscribe silug-discuss" in the body.
> 

This is a digitally signed message part.