[GRLUG] A data organization problem related to tags. Any database experts around?

Mon Apr 12 18:57:19 EDT 2010

I know you didn't want a "use this database" but your question begs
it... Postgres supports Arrays ...

if array in array, do something -> this pretty much eliminates all
needs for extremely complex table, to table, to table relationships.
It also supports hashes (key-value pairs), (binary xml), and other
nifty data-types.
All of which are indexable and searchable.

If our current inventory system was written in a database that didn't
support arrays it would have exponentially more tables, I don't even
want to think about it.

On Mon, Apr 12, 2010 at 6:26 PM, Michael Mol <mikemol at gmail.com> wrote:
> I'm not just talking about relational databases, and I'm not looking
> for "Check out *some database server*", I'm wondering specifically
> about the elegance and performance of implementing an unknown solution
> to a specific data organization problem I'm dealing with (and that
> sites like TVTropes also apparently have).
>
> Let's take TVTropes as a simple, familiar example.
>
> TVTropes has two major axis by which you navigate their data--you have
> 'tropes' and 'works', between which there's a many-many relationship;
> a trope will have many works which use it, a work will have many
> tropes, and there may be multiple examples for each trope-work
> binding.  TVTropes allows you to go to a trope's page, and you'll see
> a listing of the works that use it, often multiple instance examples
> for occurrence in each work. You can also go to a work's page and see
> the tropes that it uses.
>
> There's a major breakdown in their consistency and normalization,
> though; many times, you might click on a work link on a trope's page,
> take a look at the work's page, and see that the trope you were just
> at isn't listed. Other times, you might be on a work's page, click on
> a particular trope listed there, and not be able to find the work
> listed on that trope's page. It's even very rare that the same wording
> that was used to describe the trope usage instance
>
> It seems obvious to me that the fundamental piece of information in
> this entire arrangement is the trope usage instance, a particular
> example of a particular trope being used in a particular work.
> Ideally, you should be able to look at a trope and see a listed
> intersection of that trope with all works.
>
> Now let's step the complexity up a dimension.
>
> Each trope is going be categorized. For example, "Badass Longcoat"
> sits with "Nice Hat" in the "The Utterly And Completely Definitive
> Guide To Cool" category. "Eject... Eject... Eject..." happens to share
> space with "If It Swims It Flies" in the "Vehicl Tropes" category.
>
> There's not a lot of difference between a trope being tied to a work,
> and a trope being tied to a category, with the possible constraint
> that a trope is only in a category once, where it might be in a work
> multiple times. (For the relational database crowd, that means one
> could optimize by making the paired key between the trope and category
> a unique key in a link table, but you couldn't make it unique between
> tropes and works, or tropes and actors, or necessarily even between
> tropes and characters.)
>
> The simplest way to implement it might be a tag table, or a simple
> key-value pair table that can associate every object with any other
> object. That seems problematic, though, because that table is going to
> get *huge*, and the larger it gets, the more expensive it will be to
> query and update it.
>
>
> I've been mulling this topic over for months, because I face a similar
> problem with Rosetta Code. I have programming langauges, tasks and
> libraries that should be browseably cross-linked, and someone recently
> asked me to extend the site to support Russian as a language. I'd love
> to do that, but I can't because of normalization on RC's fundamental
> object--the code example; I don't want the existing matrix of code
> examples to have to be copied or manually managed each time another
> human language is thrown in the mix. (And it gets trickier still,
> because while the source code should be common across languages, the
> surrounding descriptive text should *not* be--but it would be helpful
> to fall back to a secondary language in the case a primary language
> isn't available.)
>
>
> The Question:
> Using the different database systems, what would an implementation of
> a data structure given the described problems look like, tuned either
> for performance or elegance?
>
>
> --
> :wq
> _______________________________________________
> grlug mailing list
> grlug at grlug.org
> http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug
>