[GRLUG] Raid, LVM, and Cheap Storage

Erik Southworth erik.southworth at gmail.com
Wed Oct 15 22:07:38 EDT 2008


On Mon, Oct 13, 2008 at 2:36 PM, Ben DeMott <ben.demott at gmail.com> wrote:

> Hey GRLUG I wanted to open a discussion on storage, and share my thoughts,
> opinions, and observations about the storage world, and get some input.
> I hope you find my observations somewhat thought provoking, and I look
> forward to hearing different opinons and experiences.
>
> *Introduction:*
> Disk storage, disks themselves are rather cheap.  In Linux this is
> furthered by the fact we have LVM2 and Software Raid support in our Kernels.
> This combined with Hardware Raid makes the combinations of disk / storage
> setups just about endless.
>
> I'm curious to know about other peoples experience with LVM2 / Virtual File
> Systems, and Raid and how they have combined these technologies together to
> serve a particular storage environments needs and purpose.
>
> This isn't a forum about 'boxed solutions' - If you feel you must share on
> how IBM or HP does it fine - but keep it brief.
> This is about actual managed solutions, that you don't depend on a vendor
> for.
>
> Some of my biggest problems with the commercial storage world:
> They oversell performance in applications where its not needed, sacrificing
> overall storage space, or simplicity.
> They adopt new disk technologies (SAS) too quickly and too widely when it
> is NOT the best solution for everyone simply to keep their profit magins
> higher (suprise, suprise)
>
> *My Experience:*
> I'm all about saving money, if I can buy 3 servers for the cost of 1, at
> the cost of more time, I'll do that - solutions that cost half of my salary
> can't ever justify themselves for being 'managed' - I'm sorry that argument
> just doesn't apply to most medium sized businesses. I call that the 'lazy IT
> department argument'
> In my consulting work I've discovered that most IT managers often don't
> know that much, or don't force themselves to keep up in technology, I can't
> tell you how many places I've been to that had IBM or HP or Dell servers
> that were setup by someone else - that weren't even configured for
> RAID!!!!!! - If you do it yourself you know it's done right, now obviously
> this can't always be the case but it's worked for me so far.
>
> When I started at my last employer they were paying $1,200 a month to
> offsite 500gb of data a week. If this is the best solution you can come up
> with you just lack an imagination...
> I could be throwing away hard drives in the dumpster after each use and
> still have a better/cost effective solution.
>
> Layering Abstraction technologies:
> Scenario -
>  Raid 1+0
>  LVM2
>  ISCSI Partition
>  Contains a Virtual File System.
>
> The only situation in which this is a good idea, is if you are mirroring
> the data to another Raid array.
> So much can go wrong to corrupt data in this scenario that the complexity
> can outweigh the benefits very quickly.
>
> *Why Mirror:*
> It's true, mirrors lack the performance of other raid striping methods,
> however:
> GMTD - "Give me the drive" is a philosophy that says the physical disk
> drive must always have value nomatter what, mirroring supports this.
> When you go to a striping methodology during a read more drives are being
> accessed simultaneously, this has consequences of its own.
>
>
> *Raid Controllers:*
> My experiences with raid controllers and their interfaces vary.  Many raid
> controllers have batteries on the controller itself, so in the event of a
> hardware failure the controllers buffer can be cleared and the raid array
> will not be damaged by incomplete writes.
> My experiences with rebuilding Raid arrays or restoring data from a crashed
> raid array are all painful.
> Does anyone have a good story or know of technologies that allow you to
> move hardware or a portion of a raid array to restore data on a different
> piece of non-identical hardware in the event of a server failure?
>
> *Raid Thoughts:*
> Hardware RAID has become a standard amongst servers, for redundancy,
> scalability, and uptime reliability.
> In my journeys as an Engineer I've encountered more failures due to human
> error, improper setup, bad hardware, not verifying data itself, the list
> goes on.
> In the end I've come to realize the more ideal solution is being hardware
> agnostic, and the best way I've accomplished this is through a combination
> of Software Raid, DRBD, and LVM - no physical raid.
> At the end of the day a raid array can not guarantee anything about data
> integrity, it is only a small piece of the puzzle, and often a convoluted
> one when the day comes that you need to rely on it.
> Instead we need to focus on the data at the data level more than we do, we
> need to use checksums and data comparison strategies while backing up and
> restoring data, this is often completely overlooked at many organizations.
> Corrupt data being mirrored is no more valuable than no data at all.
> And keep in mind our great Linux kernel caches/buffers disk transactions -
> if you have 36GB of memory on a server the majority of that will be consumed
> with buffered data, this offsets many performance concerns, and adds to the
> longevity of hardware.
> *
> Preventative Maintenance:*
> These are the things I encounter most that amaze me about IT departments:
> 1.) The Vendors hardware monitoring software is not installed.
> 2.) The only way of detecting a disk failure at a remote site is for
> someone to notice there is a 'red' or 'amber' light on.
> 3.) No one monitors system voltages
> 4.) No one monitors system temperatures.
> 5.) No one performs read / access tests or write tests or head tests - if
> this is just done once a year you can almost always predict a drive failure.
> 6.) There is too much confidence in hardware
>
> *Managing Storage:*
> Storage isn't just about picking good hardware to put data on - it's about
> weighing the uses and cost of your data and having a ready made solution
> that is bundled around what that data is for.
> Is the data application data
> Is the data user data
> is the data Corporate data
> is the Data for a hardware system (operating system)
>
> Each type of data has it's own unique needs -
> -archival importance (how far back to revision, this type of data can be
> compressed, and is less important for FAST restoration)
> -security importance (should the data be encrypted when transported to a
> remote site, should you limit departmental access)
> -User data (The data can be split up, and managed seperately from other
> forms of data, an individual user being affected is less important than a
> server going down)
> -application data (matters more about availability and restorability than
> archival)
> -Hardware data  (once again, restorability and place in time backups /hot
> standy are more important than continual backup and archival)
>
> Many organizations don't recognize these different requirements of their
> data, and fail to develop different storage plans around the purpose of
> data.
>
> Now I've pointed out all of the short sightedness I've experienced in my
> journeys - I'm sure the Linux IT community being as informed as it is will
> not be among these organizations I speak of - so I'll be interested to hear
> about your experiences and setups and uses of linux technology to come up
> with unique and cost effective storage environments.
>
> _______________________________________________
> grlug mailing list
> grlug at grlug.org
> http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug
>

Sun's Thumper - it's got ZFS!
http://www.sun.com/servers/x64/x4540/index.xml

Jungle Disk for Networks or back-ups with no hardware to manage.
http://jungledisk.com/workgroup/index.aspx
Starts at 0.15/Gb and Amazon now has teired pricing so the more U store the
cheaper it gets.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://shinobu.grlug.org/pipermail/grlug/attachments/20081015/318db58f/attachment-0001.htm 


More information about the grlug mailing list