[GRLUG] A data organization problem related to tags. Any database experts around?

Ben DeMott ben.demott at gmail.com
Wed Apr 14 13:44:56 EDT 2010


Once again mike... ->
http://www.mediawiki.org/wiki/Extension:Lucene-search


On Wed, Apr 14, 2010 at 1:43 PM, Ben DeMott <ben.demott at gmail.com> wrote:
> Mediawiki uses memcached and a Mysql Proxy to cache queries.
> They use about 60 servers in the US.
>
> On Wed, Apr 14, 2010 at 1:31 PM, Adam Tauno Williams
> <awilliam at whitemice.org> wrote:
>> On Wed, 2010-04-14 at 13:11 -0400, Michael Mol wrote:
>>> On Wed, Apr 14, 2010 at 12:46 PM, Adam Tauno Williams
>>> <awilliam at whitemice.org> wrote:
>>> > On Wed, 2010-04-14 at 09:55 -0400, mikemol at gmail.com wrote:
>>> >> I don't *modify* their quieries, or really even know what they look like.
>>> > Understood, but changing the queries isn't necessarily the point of
>>> > EXPLAINing a query.
>>> Ah, gotcha.
>>> >> >> 15-minute load averages of 0.11, and more IO waiting than I'm
>>> >> >> comfortable; I need to get in and tweek InnoDB's settings again.
>>> >> > Which as a DBA makes me think: (a) something is creating temporary sort
>>> >> > tables [and MySQL ***SUCKS*** at sorting], (b) something is creating
>>> >> > temporary join tables [a buffer space issue or really dumb join], or (c}
>>> >> > [worst case] the engine is falling back to a sequential scan due to bad
>>> >> > query design or incorrect indexing [often the issue of cardinality comes
>>> >> > into play here]
>>> >> > To determine any of those you have to look at the problem on the query
>>> >> > level.
>>> >> > Maybe current MySQL can autolog EXPLAIN of queries for you?
>>> >> I can enable profiling, sure, but it's not going to tell me anything I can
>>> >> or will change. The queries reside in a place not safe for me to touch.
>>> > They will tell you what queries are slow; and then possibly if the
>>> > indexing is wrong or ineffective - you can certainly change the indexing
>>> > - or what inserts/updates are slow, or what joins are slow.  Things can
>>> > be addressed involving all those things without changing the table
>>> > schema.  You can change indexing (modern RDBMS have many indexing
>>> > options), fragment data, and alter page and extent sizes.  All which can
>>> > have a very significant impact on performance.  Dealing with performance
>>> > issues in this manner, with existing applications [and thus *their*
>>> > schema] is what a "DBA" does;  it is an entire profession.
>>>
>>> Mm. I've had occasion and need to tune these things related to work,
>>> but it seemed absurd to me that MediaWiki, which already powers a site
>>> like Wikipedia, might need me make modifications to the tables that it
>>> created itself.
>>
>> Possibly [likely?] they've optimized how their database is structured
>> beyond out-of-the-box MediaWiki.  Having to re-index products from their
>> default installation is extremely common.  Most products I've seen
>> package a database with a whole slew of randomly chosen indexes: "Hey,
>> it might be a good idea to index that!"
>>
>>
>>
>> _______________________________________________
>> grlug mailing list
>> grlug at grlug.org
>> http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug
>>
>


More information about the grlug mailing list