One way I can think of off-hand to do that sort of thing absent metadata would be comparing statistically unlikely passages with those in a text database (<i>a la</i> turnitin) -- but there would be no way to do that entirely client-side and would have to be backed by an entity big enough to fight off the inevitable copyright claims.<br>
<br><div class="gmail_quote">On Fri, Jun 24, 2011 at 10:40 AM, Benjamin Flanders <span dir="ltr"><<a href="mailto:flanderb@gmail.com">flanderb@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div class="im">On Fri, Jun 24, 2011 at 10:03 AM, David Pembrook <<a href="mailto:david@pembrook.net">david@pembrook.net</a>> wrote:<br>
> Checkout calibre. Its cross platform and looks interesting. I've just tried<br>
> it and it did a nice job cataloging my collection. It even grabs amazon<br>
> descriptions and book covers. I have not delved into all its features. It<br>
> has a server component but I've only tried it as a desktop application. See<br>
> <a href="http://calibre-ebook.com/" target="_blank">http://calibre-ebook.com/</a>.<br>
><br>
> Quoting their about page:<br>
><br>
> calibre is a free and open source e-book library management application<br>
> developed by users of e-books for users of e-books. It has a cornucopia of<br>
> features divided into the following main categories:<br>
><br>
> Library Management<br>
> E-book conversion<br>
> Syncing to e-book reader devices<br>
> Downloading news from the web and converting it into e-book form<br>
> Comprehensive e-book viewer<br>
> Content server for online access to your book collection<br>
><br>
> Dave<br>
<br>
</div>I'm actually setting up a Calibre Server and this is what caused my<br>
inquiry. Calbre has the ability to download the book information if<br>
you already have some sort of meta-data on the book, author and/or<br>
title, but it is really finicky as would be the case for most easily<br>
implemented text comparisons. It does have the ability, on import,<br>
to grab some metadata from file types that have metadata, like PDFs,<br>
or from the file name, for those that don't have metadata. but this is<br>
finicky and I have to mess with the Regex if the file name is in a<br>
different order than the previous book I imported.<br>
<br>
<br>
<br>
<br>
<br>
Share and Enjoy<br>
Ben<br>
<div><div></div><div class="h5"><br>
<br>
<br>
<br>
<br>
><br>
> On 6/24/2011 9:33 AM, Benjamin Flanders wrote:<br>
><br>
> On Fri, Jun 24, 2011 at 9:17 AM, John-Thomas Richards <<a href="mailto:jtr@jrichards.org">jtr@jrichards.org</a>><br>
> wrote:<br>
><br>
> On Fri, Jun 24, 2011 at 06:50:32AM -0400, Benjamin Flanders wrote:<br>
><br>
> Not totally Linux related, but I thought one of you might know. Is<br>
> there a program for ebook identification? I'm thinking along the<br>
> lines of Musicbrainz PUID audio signature, but for books. I would<br>
> think it would be easier for ebooks than music since there is no<br>
> compression and a word is a word, but I am coming up with nothing on<br>
> Google. I keep coming up with e-books about fuzzy logic, isbns, tree<br>
> identification, signature analysis, and fingerprinting.<br>
><br>
> Wait. ebooks aren't compressed? Isn't plain text about the most<br>
> compressible thing around, and lossless at that? This surprises me.<br>
><br>
> I guess I should have not used the word "compressed". I was going for<br>
> the term lossless and had a brain bump. Sorry.<br>
><br>
> Anyway, I would have thought the application would have been out there<br>
> already .<br>
><br>
><br>
><br>
> --<br>
> john-thomas<br>
> ------<br>
> None are more hopelessly enslaved than those who falsely believe they are<br>
> free.<br>
> Johann Wolfgang van Goethe, novelist and philosopher (1749-1832)<br>
><br>
> --<br>
> This message has been scanned for viruses and<br>
> dangerous content by MailScanner, and is<br>
> believed to be clean.<br>
><br>
> _______________________________________________<br>
> grlug mailing list<br>
> <a href="mailto:grlug@grlug.org">grlug@grlug.org</a><br>
> <a href="http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug" target="_blank">http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug</a><br>
><br>
><br>
> --<br>
> This message has been scanned for viruses and<br>
> dangerous content by MailScanner, and is<br>
> believed to be clean.<br>
> _______________________________________________<br>
> grlug mailing list<br>
> <a href="mailto:grlug@grlug.org">grlug@grlug.org</a><br>
> <a href="http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug" target="_blank">http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug</a><br>
><br>
<br>
--<br>
This message has been scanned for viruses and<br>
dangerous content by MailScanner, and is<br>
believed to be clean.<br>
<br>
_______________________________________________<br>
grlug mailing list<br>
<a href="mailto:grlug@grlug.org">grlug@grlug.org</a><br>
<a href="http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug" target="_blank">http://shinobu.grlug.org/cgi-bin/mailman/listinfo/grlug</a><br>
</div></div></blockquote></div><br>
<br />--
<br />This message has been scanned for viruses and
<br />dangerous content by
<a href="http://www.mailscanner.info/"><b>MailScanner</b></a>, and is
<br />believed to be clean.