[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: linkdups



By comparison, Steve's original linkdups (which I use every day) takes
almost eight hours to process 1,800,000 files totalling 9 GB. Most of
the time is consumed walking the directory tree(s) computing one file at
a time. If you are deduping a repository that changes maybe 5-10% per
day, Steve's new version will already have 90-95% unchanged from the
previous day/run. This is a good example of don't work hard, work smart.

--Doc

ByOn Wed, 2022-03-09 at 22:19 -0600, Steven Pritchard wrote:
> A note for Doc (and anyone else who might be vaguely interested)...
> 
> After many, many, *many* years of non-development, I *finally* have a
> working version of my linkdups script (basically my version of
> hardlink(1)) that moves most of the logic to SQL queries against a
> sqlite database.
> 
>   https://github.com/silug/linkdups/tree/sqlite
> 
> The idea here is that if you enable the persistent cache (with the -c
> option), a lot of the checksums required to find duplicate files only
> have to be done once, which should save a lot of time on subsequent
> runs.
> 
> Now that I have it working, I need to put some effort into making it
> work quickly.  At the moment it's pretty slow, but it should
> eventually
> be as fast or faster than the old version with some work.
> 
> -
> To unsubscribe, send email to majordomo@silug.org with
> "unsubscribe silug-discuss" in the body.


-
To unsubscribe, send email to majordomo@silug.org with
"unsubscribe silug-discuss" in the body.