[GRLUG] Headerless website rendering

Wed Jun 3 14:45:50 EDT 2009

Thanks, that does clear it up. I can also see now how that would be useful.

On Wed, Jun 3, 2009 at 1:33 PM, Ben DeMott <ben.demott at gmail.com> wrote:

> Hey Ben,
> Imagine opening your browser, installing WebDeveloper Plugin, visiting a
> page, and clicking "View Generated Source" and saving it to a text file.
> This is what I am doing server-side with no browser or client involved,
> completely from the execution of a program.
>
> Let me be more specific, I'm not talking about server-side ECMA script
> execution, I'm talking about Emulating a Browser Engine - Server side
> programatically.
> It is pretty difficult to simulate the exact manner in which your browser
> renders, displays, arranges, re-orders, event listens, parses the DOM when
> received from a server not to mention how your browser handles redirects,
> forwards, headers, meta tags - on and on.
>
> It would (In my opinion) be a bad idea for me to attempt to replicate the
> behavior of a Browser in any fashion by attempting to execute Javascript /
> ECMA script server side.  If you could for example get a Javascript (jquery)
> library to act upon a DOM server-side or you know of a way I would be very
> interested to know about it.
>
> In addition when you interact and execute Javascript using a Browser engine
> (Gecko, or Webkit) you have programmatic access to the execution,
> callbacks/events, the dom itself, and lots of other stuff that you normally
> wouldn't by just running server-side Javascript.
>
> The more interesting part of this was to actually render screenshots of
> webpages as a browser does in a Virtual Framebuffer.
>
> Believe it or not, I'm using this code now for debugging purposes in my
> web-development and the possabilities are endless :)
>
> I hope this better explains what I am doing...
>
>
> On Wed, Jun 3, 2009 at 1:12 PM, Ben Rousch <brousch at gmail.com> wrote:
>
>> On Wed, Jun 3, 2009 at 12:48 PM, Ben DeMott <ben.demott at gmail.com> wrote:
>>
>>> For those that might be interested ...
>>>
>>> Recently for work I told a customer that wanted headerless execution
>>> (server side) of Javascript that I "Couldn't do it".
>>> Now over the last few years I've looked into the subject a few times,
>>> always being quite pessimistic about it.
>>>
>>> Over the last year I've been doing iPhone development and using the
>>> WebKit API - Using the Cocoa api you can "headerlessly" execute a web page
>>> by just not drawing the display buffer to the screen.
>>> Awhile after this customer asked for completely programmtic headerless
>>> rendering I had some free time so I looked back into the issue.  Knowing
>>> that the WebKit api can be accessed in such a way for this to be
>>> "theoretically" possible I pushed on...
>>>
>>> And that is when I came across Nokia's API - the Qt ('cute') API has
>>> WebKit support amongst many other things, and actually after working with
>>> the iPhone SDK I feel pretty at home using it.
>>> It's an event driven api, written in C++ ...
>>>
>>> After researching the Qt library a bit, I found that it had Python
>>> bindings (even better) and it's most recent version supported a very modern
>>> version of WebKit similar to what Safari uses.
>>>
>>> I then started looking for a way of actually 'drawing' the webpage, I
>>> knew that if I could draw to a Frame Buffer I could probably programatically
>>> on a server save and render images of web pages.
>>>
>>> After quite a lot of work I came across the work of several other
>>> individuals that had used the same process involving Qt - Although the code
>>> was a bit poorly implemented.
>>>
>>> I took concepts from several of the resources I found and wrote a python
>>> application that uses Xvfb (the X Virtual Frame Buffer) to render a web
>>> page, on a server.
>>> All you need is Python, pyQt4, Xvfb, (a script called Xvfb-run that
>>> supposedly comes with Xvfb, but I had to install manually) and a Linux
>>> distro in a 2.6 kernel variant.
>>> I have it working very well on a Fedora 9 distro with the apps listed
>>> above, if anyone is interested in my code example, or further instructions
>>> let me know and I'll throw it up on a website.
>>>
>>> And all of this was sadly to crawl google results (google re-orders and
>>> dynamically controls results with Javascript client side, including business
>>> results)
>>> If anyone wants the PHP functions that parse google (using DOM) let me
>>> know ...
>>> Along with this (sadly - I'm not proud of this) I wrote a Google Image
>>> results parser...
>>>
>>> Customers are mad, because the code will obviously break when google
>>> makes any changes but it was quite the experiment in code :) ... and I got
>>> paid for it.
>>>
>>> The image code was to (help) find logo's for companies / organizations.
>>> Garmin's data provider (InfoUSA) tracks 13,000 franchises/organizations
>>> the system had about an 80% success rate on those organizations - check out
>>> an example here.
>>>
>>> (type in a company name, like 'mcdonalds')
>>> http://apginc.net:8380/binja/test_image_query.php
>>>
>>> _______________________________________________
>>> grlug mailing list
>>> grlug at grlug.org
>>> http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug
>>
>>
>> I may be a bit dense here, but I'm not sure about what you mean by
>> "headerless". There are several ways to execute javascript server-side:
>> http://en.wikipedia.org/wiki/Server-side_JavaScript
>>
>> Or are you "viewing" web sites by saving them to an image or something?
>>
>>
>> _______________________________________________
>> grlug mailing list
>> grlug at grlug.org
>> http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug
>>
>
>
> _______________________________________________
> grlug mailing list
> grlug at grlug.org
> http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://shinobu.grlug.org/pipermail/grlug/attachments/20090603/600e480f/attachment-0001.htm