We went down last weekend due to a bad hard-drive; all customers in the accounting system got e-mails about it... if you are a customer and didn't get an email, please complain loudly, as it means I don't have you in the accounting system.
The problem was resolved by monday; if you are still having problems, again complain loudly. lsc@prgmr.com is a good address to complain at.
So, our servers are
co-located in Sacramento (at rippleweb.
- I like them quite a lot for hosting 1U boxes- they are an especially
good deal if you have high-density high-power boxes, as they use 208v
power and don't charge extra for the power sucking older dual xeons
that draw 100-200 watts per U)
So late
Friday night a drive failed, and the way I have things setup, mirroring
is optional and not-default, which means most customers don't.
Compounding
matters, I striped the swap for the Dom0 across both drives (I mirrored
everything else) so the box went down with the bad drive.
We did,
however, manage to get the bad drive to spin up (after sticking it in
the freezer for a while) so we should be back in business sometime late
tonight.
--
Also, as I
pointed out, these servers are in Sacramento. I am in
sunnyvale. First, I login to the remote kvm setup and see what I can do
in bios. Nothing; it can't even see the drive. So, I ask my friend
Chris, the guy who is partnering with me on my xen book venture to take a look at it.
So, Chris
drags the server back to his house. He plugs it into a new computer. No
dice. He even swaps the circut board with a spare drive of the same
model. Nothing. (the spare drive, with the circuit board from the bad
drive, works fine) Clearly, we have a catastrophic failure in the drive
itself. Violence wasn't helping, either. (sometimes drives suffering
from stuction can be cured with concussive force) So, we think, why not
try freezing it? people say freezing a drive can sometimes help if your
bearings are failing; but it's never worked for me.
After
several hours in the freezer, however, the drive spins up, and it stays
up long enough for us to retrieve all the data. As Chris said,
"freezing it. . . totally not myth."