[GRLUG] Rebooting linux server
Collin Kidder
collink at kkmfg.com
Tue Jul 3 10:43:47 EDT 2012
On 7/3/2012 10:24 AM, L. V. Lammert wrote:
> On Tue, 3 Jul 2012, Collin Kidder wrote:
>
>> Hmm... That does sound like something to try. I'm currently running
>> memtest86+ but there have been no errors so far. It does have ECC memory
>> so maybe there is a way to query that.
>>
> Do you run SmartDrive <IIRC? don't do hardware a lot> on your disks?
> Hitting a bad sector can also cause reboots, .. also check syslog for
> possible clues just before the reboot entries.
>
> Lee
>
It has a hardware RAID controller (SmartArray 5i). In theory you'd think
that a bad sector would get flagged by the controller and I'd see some
indication that something had gone wrong. But, I don't see to see any
RAID errors or anything.
I have installed mcelog so that I can catch mce events, hopefully before
it gets so bad that it reboots. I've also disabled ASR which monitors
the server and reboots it automatically if there is an error. This way
hopefully I can look at the frozen screen to see the last thing it said
before it died. If that doesn't work I'll probably have to find a laptop
to hook up to the server and let it log through the serial port until it
dies.
More information about the grlug
mailing list