[GRLUG] On the subject of version control, considering backups

Sun Dec 20 14:22:30 EST 2009

On Sun, Dec 20, 2009 at 1:55 PM, Ed Howland <ed.howland at gmail.com> wrote:
> This [1] might be of some help. It runs under Windows, though,
> and is not open source. But it might be a starting point.
>
> It sounds like you are trying to get whichever SCM (svn, git,
> cvs etc) to diff the insert statements for you between
> successive mysqldump outputs. The little bit of research I did
> on this echoed your lament about it using too much disk space.
> My theory is that the order of records changes too much for
> any kind of reasonably sized difference for the SCM to store.

Hm. Then running the SQL dump through an SQL-aware sorting filter
could do the trick. (Or checking if mysqldump supports stably-sorted
output.)

>
> I'm assuming you have already stored your DDL stuff there,
> structure changes, views, sprocs etc. And you now care about
> versioning the actual data. Your goal being, I presume, to
> retrieve a copy of the database rebuilt from a set of deltas
> in a VCS at a given point in time. For disaster recovery after
> something caused corruption or incorrect contents.

My goal was more to learn about git, play with it and abuse it.  And
toy with an idea I'd had. :)

For my purposes, a recovery even a week old wouldn't be too far a
setback, so I don't necessarily need the fine-grained versioning.  The
geek in me would like to have that option available, though, if I
wanted to play around with data analysis.

> I had the same thought at an earlier job. But most DBA's seem
> to prefer just using straight backups (daily, wekely, monthly
> rotations) or using logging. [2] The latter being useful to
> bring a database back up to the present, once restored from a
> backup.
>
> What are your specific requirements for database backups?

My database isn't sensitive enough that a week-old backup would be a
real killer.  It'd be annoying, and people would complain, but my
users would just get back to filling in the missing data. Programming
enthusiast geeks are cool that way.

> .
>
> Cheers,
> Ed
>
> [1] http://www.mysqldiff.org/
> [2] http://dev.mysql.com/doc/refman/5.0/en/binary-log.html
>
> On Fri, Dec 18, 2009 at 4:25 PM, Michael Mol <mikemol at gmail.com> wrote:
>> For a while, I was backing Rosetta Code's database up to a git repo.
>> (With mysqldump options set up such that there would be one insert per
>> record per line)  I saw this as a way for finer-grained backups. It
>> worked rather nicely, for a bit.
>>
>> The problem I ran into, though was that the git repo grew *fast*.
>> After only a few weeks worth of runs, I believe the repo was twice the
>> size of a raw dump, which was itself on the order of 700-800MB. I
>> wouldn't be able to keep up that pace; Disk isn't *that* cheap.
>>
>> Does anyone else have any thoughts on possible practical abuses of SCV
>> systems for database and systems backups?
>>
>> --
>> :wq
>> _______________________________________________
>> grlug mailing list
>> grlug at grlug.org
>> http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug
>>
>
>
>
> --
> Ed Howland
> http://greenprogrammer.wordpress.com
> http://twitter.com/ed_howland
> _______________________________________________
> grlug mailing list
> grlug at grlug.org
> http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug
>

-- 
:wq