[GRLUG] Rebooting linux server

Collin Kidder collink at kkmfg.com
Tue Jul 3 10:43:47 EDT 2012


On 7/3/2012 10:24 AM, L. V. Lammert wrote:
> On Tue, 3 Jul 2012, Collin Kidder wrote:
>
>>    Hmm... That does sound like something to try. I'm currently running
>> memtest86+ but there have been no errors so far. It does have ECC memory
>> so maybe there is a way to query that.
>>
> Do you run SmartDrive <IIRC? don't do hardware a lot> on your disks?
> Hitting a bad sector can also cause reboots, .. also check syslog for
> possible clues just before the reboot entries.
>
> 	Lee
>

It has a hardware RAID controller (SmartArray 5i). In theory you'd think 
that a bad sector would get flagged by the controller and I'd see some 
indication that something had gone wrong. But, I don't see to see any 
RAID errors or anything.

I have installed mcelog so that I can catch mce events, hopefully before 
it gets so bad that it reboots. I've also disabled ASR which monitors 
the server and reboots it automatically if there is an error. This way 
hopefully I can look at the frozen screen to see the last thing it said 
before it died. If that doesn't work I'll probably have to find a laptop 
to hook up to the server and let it log through the serial port until it 
dies.


More information about the grlug mailing list