Wikis

Wikis

Copied from Email on Wed, 16 Mar 2005 12:33:33 -0600 from Barry Dmytro (badcherry@mailc.net)

* Tom Counsell (tamc2@cam.ac.uk) wrote:
> >As far as other libs I've borrowed (that aren't in the stdlib) I use
> >redcloth & diff/lcs, but am not too sure if I am going to stick with
> >that diff library or maybe just rewrite my use of it as it generates
> >massively large yaml dumps.  zlib helps compensate for that somewhat,
> >but with a large document with hundreds of thousands of revisions, we
> >could get revision files in the megabytes or even larger.
> 
> How to deal with large documents is something I've been mulling over.  
> Clearly you could save some space by using a diff library that doesn't 
> do the nested array thing for changes, and possibly that was clever 
> enough to do word diffs rather than line diffs in situations where that 
> would save space.  I've just been wondering whether it would be 
> possible to write an algorithm that would periodically do some clever 
> merging of revisions

As of right now, I'm doing diffs by characters just as a test to see how
well it worked, but I now think that the best route is to go with using
diffs by words.  That will require a bit of work on my part as the
algorithm will be a bit more complex.

> Anyway, I'm looking forward to nosing though your code for ideas 
> (assuming it is open source of course)

I plan on doing a cvs import today, but this is code that I've only been
working on for 3 days so don't expect too much :)

> Tom
Kind Regards,
--------------------------------------------------------------------
Barry Dmytro
badcherry@mailc.net
http://badcherry.org/
--------------------------------------------------------------------

Edit this page or watch for changes using RSS.