Wikis
Copied from Email on Wed, 16 Mar 2005 12:33:33 -0600 from Barry Dmytro (badcherry@mailc.net)
* Tom Counsell (tamc2@cam.ac.uk) wrote: > >As far as other libs I've borrowed (that aren't in the stdlib) I use > >redcloth & diff/lcs, but am not too sure if I am going to stick with > >that diff library or maybe just rewrite my use of it as it generates > >massively large yaml dumps. zlib helps compensate for that somewhat, > >but with a large document with hundreds of thousands of revisions, we > >could get revision files in the megabytes or even larger. > > How to deal with large documents is something I've been mulling over. > Clearly you could save some space by using a diff library that doesn't > do the nested array thing for changes, and possibly that was clever > enough to do word diffs rather than line diffs in situations where that > would save space. I've just been wondering whether it would be > possible to write an algorithm that would periodically do some clever > merging of revisions As of right now, I'm doing diffs by characters just as a test to see how well it worked, but I now think that the best route is to go with using diffs by words. That will require a bit of work on my part as the algorithm will be a bit more complex. > Anyway, I'm looking forward to nosing though your code for ideas > (assuming it is open source of course) I plan on doing a cvs import today, but this is code that I've only been working on for 3 days so don't expect too much :) > Tom Kind Regards, -------------------------------------------------------------------- Barry Dmytro badcherry@mailc.net http://badcherry.org/ --------------------------------------------------------------------
Edit this page or
watch for changes using RSS.