[GRLUG] Raid, LVM, and Cheap Storage

Adam Tauno Williams adamtaunowilliams at gmail.com
Mon Oct 13 15:45:15 EDT 2008


> Introduction:
> Disk storage, disks themselves are rather cheap. 

"There are two things that you need to know about storage. The first
is
that it keeps getting cheaper---unbelievably so.  The second is that it
keeps getting more expensive---unbelievably so."
--- The Practice of System and Network Administration

> In Linux this is furthered by the fact we have LVM2 and Software Raid
> support in our Kernels.

Every other current OS provides these as well.

> Some of my biggest problems with the commercial storage world:
> They oversell performance in applications where its not needed,
> sacrificing overall storage space, or simplicity.

They have a second reason;  they sell a solution then ends up being
inadequate and it is probably going to get BLOG'd about by an angry IT
tech who didn't do his homework.  Regarding these things I have a fair
amount of sympathy for vendors - they often take the rap for cheap
customers;  and reputation is very important in the storage market.

> My Experience:
> I'm all about saving money, if I can buy 3 servers for the cost of 1,
> at the cost of more time, I'll do that - solutions that cost half of
> my salary can't ever justify themselves for being 'managed' - I'm
> sorry that argument just doesn't apply to most medium sized
> businesses.

Adding management to a server certainly doesn't triple the cost.  Being
able to remotely manage a server (a) can make life in IT bearable and
(b) helps the company have better uptime.

> If you do it yourself you know it's done right, now obviously this
> can't always be the case but it's worked for me so far.

I strongly agree with this;  however one also needs to beware of highly
customized / tweaked solutions - no one else could support them.  People
leave, get sick, die....

> Layering Abstraction technologies:
> Scenario -
>  Raid 1+0 
>  LVM2
>  ISCSI Partition
>  Contains a Virtual File System.
> The only situation in which this is a good idea, is if you are
> mirroring the data to another Raid array.
> So much can go wrong to corrupt data in this scenario that the
> complexity can outweigh the benefits very quickly.

It may be technologically more complex;  but management, and
flexibility, is vastly improved.  It makes no sense not to do this.
Almost our entire enterprise runs on LVM over iSCSI over RAID.  All
these technologies are extremely reliable;  in many years I've *never*
had data corruption resulting from LVM or RAID.  Only blown filesystem
I've ever had was a ReiserFS - that was simple, don't use ReiserFS.

> Raid Controllers:
> My experiences with raid controllers and their interfaces vary.  Many
> raid controllers have batteries on the controller itself, so in the
> event of a hardware failure the controllers buffer can be cleared and
> the raid array will not be damaged by incomplete writes.
> My experiences with rebuilding Raid arrays or restoring data from a
> crashed raid array are all painful.  
> Does anyone have a good story or know of technologies that allow you
> to move hardware or a portion of a raid array to restore data on a
> different piece of non-identical hardware in the event of a server
> failure?

No, that can't be expected to work.  RAID insulates you from failures of
individual drives, and nothing else.

> In the end I've come to realize the more ideal solution is being
> hardware agnostic, and the best way I've accomplished this is through
> a combination of Software Raid, DRBD, and LVM - no physical raid.

Depends upon the RAID card.  Get one with a good reputation and an
actively supported driver - I've never had a problem with RAID
corruption using a supported driver.  I've have hardware-RAID corruption
several times, but only using hardware I shouldn't have been using in
the first place (my fault).

> Preventative Maintenance:
> These are the things I encounter most that amaze me about IT
> departments:
> 1.) The Vendors hardware monitoring software is not installed.

Often because it is crap.

> 2.) The only way of detecting a disk failure at a remote site is for
> someone to notice there is a 'red' or 'amber' light on.
> 3.) No one monitors system voltages
> 4.) No one monitors system temperatures.

OpenNMS! <http://www.opennms.org/index.php/Main_Page>  I guarantee if
the above have to be done manually, it just won't happen.

> 5.) No one performs read / access tests or write tests or head tests -
> if this is just done once a year you can almost always predict a drive
> failure.

SMART will do this automatically on current systems.

> Each type of data has it's own unique needs - 
> -archival importance (how far back to revision, this type of data can
> be compressed, and is less important for FAST restoration)
> -security importance (should the data be encrypted when transported to
> a remote site, should you limit departmental access)
> -User data (The data can be split up, and managed seperately from
> other forms of data, an individual user being affected is less
> important than a server going down)
> -application data (matters more about availability and restorability
> than archival)
> -Hardware data  (once again, restorability and place in time
> backups /hot standy are more important than continual backup and
> archival)
> Many organizations don't recognize these different requirements of
> their data, and fail to develop different storage plans around the
> purpose of data.

These questions are all part of creating a *legally required* data
retention policy.  Acting contrary to an organizations data retention
policy opens the possibility of *criminal* prosecution.  The fear of
lawyers is very effective in making an organization get pro-active about
such things.
<http://www.whitemiceconsulting.com/node/157>

> Now I've pointed out all of the short sightedness I've experienced in
> my journeys - I'm sure the Linux IT community being as informed as it
> is will not be among these organizations I speak of - so I'll be
> interested to hear about your experiences and setups and uses of linux
> technology to come up with unique and cost effective storage
> environments.

We've recently started scrapping out all our old hardware and
consolidating on VMware ESX on a pair of Silicon Mechanics servers
connected to an EMC SAN via iSCSI.  Myriad physical servers is too much
of a maintanence burden, too hot, and too inflexible.  This frees up IT
to focus on interesting/useful problems.




More information about the grlug mailing list