I actually found out that auditd won't catch reboot(2) very well. The reason is that the detection of reboot() is kinda slow and the system doesn't seem to remember the audit report when it restarts, but it's slow enough that it doesn't log it out to disk either. Part of the reason it doesn't log it out to disk is that on all reboots except the actual reboot command (or something similar that does sync() then reboot() at a syscall level) go through init, which in turn will shut off auditd, the part that does the actual logging. I didn't have physical access to the machine and technically am not responsible for the hardware, but I figured out, after some trial and error that it appears to be one of the sysctl settings in the network stack I was using a little too aggressively. Whether this is an issue in the tcp/ip stack itself, sloppy configuration (although the docs appear to agree with me) on my part, an issue with the kernel that is exposing a hardware problem, the fix appears to be, at least for now, to scale back the settings. More careful examination of dmesg proves that the kernel has been crashing (but since it's a stock CentOS 5 install, all of the useful kernel debugging utils are gone), not anything calling reboot() that I might catch with debugging tools anyway. Thanks for the hardware suggestions though. I will suggest to the person handling the hardware this stuff and maybe he'll come back and tell me it wasn't my fault after all. Thanks, -- Brandon Joseph Adams Email: emidln@gmail.com Tel: +1 217-953-0257 GPG Key: 1024D/2F2EFCCF Fingerprint = 2AEE AA3F B5C9 409D 2EE6 E38B CA06 8CDF 2F2E FCCF On Thursday 29 October 2009 11:51:10 am Robert G. (Doc) Savage wrote: > On Thu, 2009-10-29 at 01:29 -0500, Brandon Joseph Adams wrote: > > Hi > > > > So I'm back doing linux stuff after a few years and I ran into a rather > > signficant hangup. The system I'm working on is a CentOS 5.4 machine > > keeps rebooting at semi-random intervals whenever the iptables rules > > change. I'm not currently sure if this is a fault in my iptables rules, a > > hardware problem, or the result of a rogue legacy script finding > > something amiss and rebooting. The solution I came up with is as follows > > (with auditd enabled): > > > > auditctl -a entry,always -S reboot > > > > My only concern is that there might be another way to reboot the system > > that isn't using reboot(2). init 6 seems to use reboot(2) and the > > userspace reboot command seems to use it as well. Is there another way > > (outside of magic sysrq) that the system might be rebooting via software? > > > > Also, is there a better (maybe more obvious) way to be checking for this. > > The logs in /var/log aren't very helpful and last & friends just tell me > > that a reboot happened, not what caused it. > > Brandon, > > Welcome back to the "big leagues". I'm running RHEL5.4 Server > (supplemented by some CentOS workstation packages) on a SuperMicro > dual-socket Opteron server. I've never seen a spontaneous reboot problem > like you describe, so it's probably not due to any underlying software > bug or misconfiguration. > > What's your hardware? More specifically, how old is the power supply? > The last time I had spontaneous reboots I traced it to broken-down > electrolytic filter capacitors in the power supply that was allowing AC > spiking onto the +5V and +12V power lines. Electrolytics are the big > beer-can size capacitors filled with a moist paste (the electrolyte). > With time and temperature, that paste eventually dries out and the > capacitor loses its ability to filter AC voltages applied across its > terminals. Any power supply older than 4-5 years should be considered > suspect. > > It's easy enough to diagnose this. Take a volt-ohm meter (VOM) set to > measure AC volts and measure between +5V or+12V leads and ground on any > unused Molex power connector. It should read zero on all scale settings. > If you have access to an oscilloscope, you should see no spikes riding > on the DC voltages. > > --Doc > > > - > To unsubscribe, send email to majordomo@silug.org with > "unsubscribe silug-discuss" in the body. >
This is a digitally signed message part.