[GRLUG] Headerless website rendering

Ben DeMott ben.demott at gmail.com
Wed Jun 3 12:48:46 EDT 2009


For those that might be interested ...

Recently for work I told a customer that wanted headerless execution (server
side) of Javascript that I "Couldn't do it".
Now over the last few years I've looked into the subject a few times, always
being quite pessimistic about it.

Over the last year I've been doing iPhone development and using the WebKit
API - Using the Cocoa api you can "headerlessly" execute a web page by just
not drawing the display buffer to the screen.
Awhile after this customer asked for completely programmtic headerless
rendering I had some free time so I looked back into the issue.  Knowing
that the WebKit api can be accessed in such a way for this to be
"theoretically" possible I pushed on...

And that is when I came across Nokia's API - the Qt ('cute') API has WebKit
support amongst many other things, and actually after working with the
iPhone SDK I feel pretty at home using it.
It's an event driven api, written in C++ ...

After researching the Qt library a bit, I found that it had Python bindings
(even better) and it's most recent version supported a very modern version
of WebKit similar to what Safari uses.

I then started looking for a way of actually 'drawing' the webpage, I knew
that if I could draw to a Frame Buffer I could probably programatically on a
server save and render images of web pages.

After quite a lot of work I came across the work of several other
individuals that had used the same process involving Qt - Although the code
was a bit poorly implemented.

I took concepts from several of the resources I found and wrote a python
application that uses Xvfb (the X Virtual Frame Buffer) to render a web
page, on a server.
All you need is Python, pyQt4, Xvfb, (a script called Xvfb-run that
supposedly comes with Xvfb, but I had to install manually) and a Linux
distro in a 2.6 kernel variant.
I have it working very well on a Fedora 9 distro with the apps listed above,
if anyone is interested in my code example, or further instructions let me
know and I'll throw it up on a website.

And all of this was sadly to crawl google results (google re-orders and
dynamically controls results with Javascript client side, including business
results)
If anyone wants the PHP functions that parse google (using DOM) let me know
...
Along with this (sadly - I'm not proud of this) I wrote a Google Image
results parser...

Customers are mad, because the code will obviously break when google makes
any changes but it was quite the experiment in code :) ... and I got paid
for it.

The image code was to (help) find logo's for companies / organizations.
Garmin's data provider (InfoUSA) tracks 13,000 franchises/organizations the
system had about an 80% success rate on those organizations - check out an
example here.

(type in a company name, like 'mcdonalds')
http://apginc.net:8380/binja/test_image_query.php
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://shinobu.grlug.org/pipermail/grlug/attachments/20090603/c33a0a7b/attachment-0001.htm 


More information about the grlug mailing list