<div dir="ltr"><br><br><div class="gmail_quote">On Mon, Oct 13, 2008 at 2:36 PM, Ben DeMott <span dir="ltr"><<a href="mailto:ben.demott@gmail.com">ben.demott@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div dir="ltr">Hey GRLUG I wanted to open a discussion on storage, and share my thoughts, opinions, and observations about the storage world, and get some input.<br>I hope you find my observations somewhat thought provoking, and I look forward to hearing different opinons and experiences.<br>
<br><b>Introduction:</b><br>Disk storage, disks themselves are rather cheap. In Linux this is furthered by the fact we have LVM2 and Software Raid support in our Kernels.<br>This combined with Hardware Raid makes the combinations of disk / storage setups just about endless.<br>
<br>I'm curious to know about other peoples experience with LVM2 / Virtual File Systems, and Raid and how they have combined these technologies together to serve a particular storage environments needs and purpose.<br>
<br>This isn't a forum about 'boxed solutions' - If you feel you must share on how IBM or HP does it fine - but keep it brief. <br>This is about actual managed solutions, that you don't depend on a vendor for.<br>
<br>Some of my biggest problems with the commercial storage world:<br>They oversell performance in applications where its not needed, sacrificing overall storage space, or simplicity.<br>They adopt new disk technologies (SAS) too quickly and too widely when it is NOT the best solution for everyone simply to keep their profit magins higher (suprise, suprise)<br>
<br><b>My Experience:</b><br>I'm all about saving money, if I can buy 3 servers for the cost of 1, at the cost of more time, I'll do that - solutions that cost half of my salary can't ever justify themselves for being 'managed' - I'm sorry that argument just doesn't apply to most medium sized businesses. I call that the 'lazy IT department argument'<br>
In my consulting work I've discovered that most IT managers often don't know that much, or don't force themselves to keep up in technology, I can't tell you how many places I've been to that had IBM or HP or Dell servers that were setup by someone else - that weren't even configured for RAID!!!!!! - If you do it yourself you know it's done right, now obviously this can't always be the case but it's worked for me so far.<br>
<br>When I started at my last employer they were paying $1,200 a month to offsite 500gb of data a week. If this is the best solution you can come up with you just lack an imagination...<br>I could be throwing away hard drives in the dumpster after each use and still have a better/cost effective solution.<br>
<br>Layering Abstraction technologies:<br>Scenario -<br> Raid 1+0 <br> LVM2<br> ISCSI Partition<br> Contains a Virtual File System.<br><br>The only situation in which this is a good idea, is if you are mirroring the data to another Raid array.<br>
So much can go wrong to corrupt data in this scenario that the complexity can outweigh the benefits very quickly.<br><br><b>Why Mirror:</b><br>It's true, mirrors lack the performance of other raid striping methods, however:<br>
GMTD - "Give me the drive" is a philosophy that says the physical disk drive must always have value nomatter what, mirroring supports this.<br>When you go to a striping methodology during a read more drives are being accessed simultaneously, this has consequences of its own.<br>
<br><br>
<b>Raid Controllers:</b><br>My experiences with raid controllers and their interfaces vary. Many raid controllers have batteries on the controller itself, so in the event of a hardware failure the controllers buffer can be cleared and the raid array will not be damaged by incomplete writes.<br>
My experiences with rebuilding Raid arrays or restoring data from a crashed raid array are all painful. <br>Does anyone have a good story or know of technologies that allow you to move hardware or a portion of a raid array to restore data on a different piece of non-identical hardware in the event of a server failure?<br>
<br><b>Raid Thoughts:</b><br>Hardware RAID has become a standard amongst servers, for redundancy, scalability, and uptime reliability.<br>In my journeys as an Engineer I've encountered more failures due to human error, improper setup, bad hardware, not verifying data itself, the list goes on.<br>
In the end I've come to realize the more ideal solution is being hardware agnostic, and the best way I've accomplished this is through a combination of Software Raid, DRBD, and LVM - no physical raid.<br>At the end of the day a raid array can not guarantee anything about data integrity, it is only a small piece of the puzzle, and often a convoluted one when the day comes that you need to rely on it.<br>
Instead we need to focus on the data at the data level more than we do, we need to use checksums and data comparison strategies while backing up and restoring data, this is often completely overlooked at many organizations. Corrupt data being mirrored is no more valuable than no data at all.<br>
And keep in mind our great Linux kernel caches/buffers disk transactions - if you have 36GB of memory on a server the majority of that will be consumed with buffered data, this offsets many performance concerns, and adds to the longevity of hardware.<br>
<b><br>Preventative Maintenance:</b><br>These are the things I encounter most that amaze me about IT departments:<br>1.) The Vendors hardware monitoring software is not installed.<br>2.) The only way of detecting a disk failure at a remote site is for someone to notice there is a 'red' or 'amber' light on.<br>
3.) No one monitors system voltages<br>4.) No one monitors system temperatures.<br>5.) No one performs read / access tests or write tests or head tests - if this is just done once a year you can almost always predict a drive failure.<br>
6.) There is too much confidence in hardware<br><br><b>Managing Storage:</b><br>Storage isn't just about picking good hardware to put data on - it's about weighing the uses and cost of your data and having a ready made solution that is bundled around what that data is for.<br>
Is the data application data<br>Is the data user data<br>is the data Corporate data<br>is the Data for a hardware system (operating system)<br><br>Each type of data has it's own unique needs - <br>-archival importance (how far back to revision, this type of data can be compressed, and is less important for FAST restoration)<br>
-security importance (should the data be encrypted when transported to a remote site, should you limit departmental access)<br>-User data (The data can be split up, and managed seperately from other forms of data, an individual user being affected is less important than a server going down)<br>
-application data (matters more about availability and restorability than archival)<br>-Hardware data (once again, restorability and place in time backups /hot standy are more important than continual backup and archival)<br>
<br>Many organizations don't recognize these different requirements of their data, and fail to develop different storage plans around the purpose of data.<br><br>Now I've pointed out all of the short sightedness I've experienced in my journeys - I'm sure the Linux IT community being as informed as it is will not be among these organizations I speak of - so I'll be interested to hear about your experiences and setups and uses of linux technology to come up with unique and cost effective storage environments.<br>
</div>
<br>_______________________________________________<br>
grlug mailing list<br>
<a href="mailto:grlug@grlug.org">grlug@grlug.org</a><br>
<a href="http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug" target="_blank">http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug</a><br></blockquote></div><br>Sun's Thumper - it's got ZFS!<br><a href="http://www.sun.com/servers/x64/x4540/index.xml">http://www.sun.com/servers/x64/x4540/index.xml</a><br>
<br>Jungle Disk for Networks or back-ups with no hardware to manage.<br><a href="http://jungledisk.com/workgroup/index.aspx">http://jungledisk.com/workgroup/index.aspx</a><br>Starts at 0.15/Gb and Amazon now has teired pricing so the more U store the cheaper it gets.<br>
<br><br><br></div>