[GRLUG] Headerless website rendering

Ben Rousch brousch at gmail.com
Wed Jun 3 13:12:12 EDT 2009


On Wed, Jun 3, 2009 at 12:48 PM, Ben DeMott <ben.demott at gmail.com> wrote:

> For those that might be interested ...
>
> Recently for work I told a customer that wanted headerless execution
> (server side) of Javascript that I "Couldn't do it".
> Now over the last few years I've looked into the subject a few times,
> always being quite pessimistic about it.
>
> Over the last year I've been doing iPhone development and using the WebKit
> API - Using the Cocoa api you can "headerlessly" execute a web page by just
> not drawing the display buffer to the screen.
> Awhile after this customer asked for completely programmtic headerless
> rendering I had some free time so I looked back into the issue.  Knowing
> that the WebKit api can be accessed in such a way for this to be
> "theoretically" possible I pushed on...
>
> And that is when I came across Nokia's API - the Qt ('cute') API has WebKit
> support amongst many other things, and actually after working with the
> iPhone SDK I feel pretty at home using it.
> It's an event driven api, written in C++ ...
>
> After researching the Qt library a bit, I found that it had Python bindings
> (even better) and it's most recent version supported a very modern version
> of WebKit similar to what Safari uses.
>
> I then started looking for a way of actually 'drawing' the webpage, I knew
> that if I could draw to a Frame Buffer I could probably programatically on a
> server save and render images of web pages.
>
> After quite a lot of work I came across the work of several other
> individuals that had used the same process involving Qt - Although the code
> was a bit poorly implemented.
>
> I took concepts from several of the resources I found and wrote a python
> application that uses Xvfb (the X Virtual Frame Buffer) to render a web
> page, on a server.
> All you need is Python, pyQt4, Xvfb, (a script called Xvfb-run that
> supposedly comes with Xvfb, but I had to install manually) and a Linux
> distro in a 2.6 kernel variant.
> I have it working very well on a Fedora 9 distro with the apps listed
> above, if anyone is interested in my code example, or further instructions
> let me know and I'll throw it up on a website.
>
> And all of this was sadly to crawl google results (google re-orders and
> dynamically controls results with Javascript client side, including business
> results)
> If anyone wants the PHP functions that parse google (using DOM) let me know
> ...
> Along with this (sadly - I'm not proud of this) I wrote a Google Image
> results parser...
>
> Customers are mad, because the code will obviously break when google makes
> any changes but it was quite the experiment in code :) ... and I got paid
> for it.
>
> The image code was to (help) find logo's for companies / organizations.
> Garmin's data provider (InfoUSA) tracks 13,000 franchises/organizations the
> system had about an 80% success rate on those organizations - check out an
> example here.
>
> (type in a company name, like 'mcdonalds')
> http://apginc.net:8380/binja/test_image_query.php
>
> _______________________________________________
> grlug mailing list
> grlug at grlug.org
> http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug


I may be a bit dense here, but I'm not sure about what you mean by
"headerless". There are several ways to execute javascript server-side:
http://en.wikipedia.org/wiki/Server-side_JavaScript

Or are you "viewing" web sites by saving them to an image or something?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://shinobu.grlug.org/pipermail/grlug/attachments/20090603/56a29520/attachment-0001.htm 


More information about the grlug mailing list