[GRLUG] Unimplemented tasks on Rosetta Code

Thu Mar 19 11:50:29 EDT 2009

I run rosettacode.org, and one of the recurring questions over the
years was "is there a list of tasks that *aren't* implemented in a
given language?

Fairly recently, I created ImplSearchBot, a MediaWiki bot written in
Perl, to fill that need.

results: http://rosettacode.org/wiki/Category:Unimplemented_tasks_by_language

(It also posts these every time it runs.)
code: http://rosettacode.org/wiki/User:ImplSearchBot/Code
stats: http://rosettacode.org/wiki/User:ImplSearchBot/Stats

To start with, ImplSearchBot gets a list of languages on Rosetta Code,
then gets a list of all of the tasks on Rosetta code. That's what it
did in its first inception, and that's what it still does. What it
does after that has changed over time.

Originally, it would then generate a complete matrix of which tasks
had implementations in which languages. The thought was to post the
matrix on a single wiki page, as a sort of detailed progress report.
However, with 100+ languages and 200+ tasks, the sheer amount of data
that was produced was enormous enough to cause HTTP 500 errors when I
tried posting it. So rather than producing a complete listing of what
implementations there were and weren't, I reduced it to which
implementations were missing.

Even with the reduced dataset, MediaWiki--with current server
settings--couldn't handle the amount of data in the page, so I rewrote
it to write to a single "unimplemented tasks in language X" page for X
in (all languages).

That worked; No individual language's page had enough data to cause
the server complain. Shortly thereafter, however, it was pointed out
that not all languages were considered suitable for all the listed
tasks, and shouldn't be included in the list of unimplemented tasks.
This is actually a very, very old problem and recurring problem that
was first brought up within days of our initial Slashdotting; Someone
wanted to know whether RC was limited to procedural programming
languages, or if presentation languages like TeX, LaTeX, HTML, PDF,
etc would be supported, because all of the tasks on the site up to
that point were focused on imperative programming languages. To help
with presentation languages, I added a few tasks along those lines,
but that aspect never really took off. Still, the issue keeps coming
back with object-oriented vs non-object-oriented, functional vs
procedural, imperative vs declarative (If you don't know what a
declarative programming language is, I suggest you check them out.
Sweet stuff.), and with tasks like Man-or-Boy, Quine and First-Class
Functions, the question will keep coming up.

The solution was to track a second listing of languages, a listing of
which tasks were deemed inappropriate for a given language--Omitted. I
modified ImplSearchBot to track the second listing, and published both
listings in the hopes of preventing an overzealous language advocate
or detractor from marking tasks as inappropriate for a language when
they really aren't, and also to give supporters of a language the
option of implementing a task in their favored language, even if it
would normally be considered inappropriate.

(slight diversion from ImplSearchBot's history)
Here we have another issue...For what kind of task is it inappropriate
to provide an example in a given language, for what kind of task is it
impossible to provide such an example? Under what circumstances do we
say "Yeah, this task? That language? Don't bother."

The part of the question that tends to lead to controversy lies in
whether or not the task is idiomatic for the language, or whether or
not it can be done idiomatically. A piece of code is "idiomatic" to a
language if it follows that language's accepted design patterns. You
don't write a "for" loop in Haskell to process a list, and you don't
use functors in a C program. That's not to say you can't do those
things (er...I think it's possible to use a for loop in Haskell to
process list items one at a time), but it's not something you should
normally do. Either the language has better facilities built-in for
what you're trying to do, or you're trying to do something that the
language isn't supposed to be able to do in the first place.

It's my personal position that if it's possible to implement a task in
a language, then we ought to have the code that shows how to do it,
regardless of how ugly, unidiomatic or apoplexy-inducing it may be.
(end diversion)

Now I had a problem...Every time I ran ImplSearchBot, I was querying
the contents of a couple hundred MediaWiki categories, and posting
content to a couple hundred pages.  This process consumes a lot of
resources on the server. So I modified ImplSearchBot to cache the
contents of all of those categories each time it ran, and compare the
current contents of those categories to what they were the last time
it ran. This way, while I was still querying a couple hundred
categories, I was posting fewer than ten pages each day. (Which was a
significant enough savings that I now allow ImplSearchBot to run every
four hours, rather than every 24.)

At this point, everything seems to work*, and I even have the bot
saving the cache data to Subversion so that I can write other bots
that can trigger on matrix changes without querying MediaWiki for
category contents (The first such bot on the agenda is intended to
produce a set of RSS feeds for per-language metrics and some overall
interesting metadata).  And I plan on opening up read access to the
subversion repo so that anyone can do the same.

* There is a known bug, but the fix involves fixing a site structre issue.

Anyway, any observations, ruminations or critiques?

-- 
:wq