[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: linkdups
By comparison, Steve's original linkdups (which I use every day) takes
almost eight hours to process 1,800,000 files totalling 9 GB. Most of
the time is consumed walking the directory tree(s) computing one file at
a time. If you are deduping a repository that changes maybe 5-10% per
day, Steve's new version will already have 90-95% unchanged from the
previous day/run. This is a good example of don't work hard, work smart.
--Doc
ByOn Wed, 2022-03-09 at 22:19 -0600, Steven Pritchard wrote:
> A note for Doc (and anyone else who might be vaguely interested)...
>
> After many, many, *many* years of non-development, I *finally* have a
> working version of my linkdups script (basically my version of
> hardlink(1)) that moves most of the logic to SQL queries against a
> sqlite database.
>
> https://github.com/silug/linkdups/tree/sqlite
>
> The idea here is that if you enable the persistent cache (with the -c
> option), a lot of the checksums required to find duplicate files only
> have to be done once, which should save a lot of time on subsequent
> runs.
>
> Now that I have it working, I need to put some effort into making it
> work quickly. At the moment it's pretty slow, but it should
> eventually
> be as fast or faster than the old version with some work.
>
> -
> To unsubscribe, send email to majordomo@silug.org with
> "unsubscribe silug-discuss" in the body.
-
To unsubscribe, send email to majordomo@silug.org with
"unsubscribe silug-discuss" in the body.
- References:
- linkdups
- From: Steven Pritchard <steve@silug.org>