I don’t often do much in the way of hardware, but recently I had some problems with a couple of our Proliant DL180 G5 servers and the controller for the RAID array.
We took a power outage to our building and the generator that was supposed to come on never did. the UPS was speced to handle the period of time between a power outage and the generator coming on, so after a few minutes all of the servers powered down. Not necessarily what i wanted to come into on Monday morning, but it did highlight a couple of needs that i had expressed over the past couple of years.
When i powered everything back up, I only had 2 servers that came up with any large problems. Both seemed to be issues with the Smart Array controller as the server would get to that point in the POST and would sit at SmartArray E200 initializing for a few minutes and then fail to boot. I contacted support and they had me do a few things:
1. Boot from the Easy Setup CD and go to Maintenance and run Array diagnostics – Array diagnostics couldn’t find that any array controllers were installed.
2. update the flash rom for the server – There is a download for the server that then creates a bootable USB stick that you can use to update a server that won’t boot into the OS.
3. Reseat the Cache module on the controller card – there’s a chip on the Smart Array controller card that on first glance looks like a chubby piece of memory.
4. upgrade Storage Firmware using Maintenance CD – there is another download on the HP site to take their maintenance CD and create a bootable USB Drive. then you can update\replace some of the drivers on the CD. once you boot off the USB stick, it can automatically detect any updates and apply them. it also failed to see that there was a Storage controller installed.
5. boot with Cache removed from SA E200 – they then had me remove the Cache module and boot the server with the slot on the controller empty.
6. Move Smart Array card to another slot, Clear CMOS, upgrade Firmware with Smart Update Manager – i considered this their hail mary before they replaced the controller. they had me move the controller card to another slot on the server, clear the CMOS (this is done by holding down a button on the motherboard labeled CMOS, when you power back up the system date and time will need to be reset) and then try to update using the USB key from step 4.
so after all this we were still at the same place as we started. so they sent out a new Smart Array E200 controller card.
I replace the controller card and was still having the exact same problem.
I got back on the phone with support after quickly running through the above 6 things. I was trying to save myself having to run back and forth to the server when support asked me to try them again. since i had already exhausted all the prompts on their screen (i guess) they considered this an odd single case issue (funny that i had two servers doing the same thing), and had to get special instructions which amounted to replace the Cache module.
the new cache module came the next day and once installed the server booted normally.
oddly, for the second server the next support person that i got insisted it was the motherboard since booting with the cache module removed did not make any difference and setup for a tech to be dispatched with a motherboard.
I spoke with the tech and he confirmed my suspicion that the motherboard was most likely not faulty and that it was more likely the cache module. He explained that it used to be that removing the cache module would allow the server to boot, but he has recently found that if the server shipped with the cache module installed, the servers seem to expect it to be there and if it’s not (or it’s faulty) the controller can’t initialize.
So he came out the next day with a new cache module, installed it and the server worked fine.