[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: linkdups

To: silug-discuss@silug.org
Subject: Re: linkdups
From: "Robert G. (Doc) Savage" <dsavage@peaknet.net>
Date: Thu, 10 Mar 2022 02:27:04 -0600
In-Reply-To: <Yil8Q+vtRIsOVvZ0@osiris.store.computerroom.us>
Organization: Southern Illinois Linux Users Group
References: <Yil8Q+vtRIsOVvZ0@osiris.store.computerroom.us>
Reply-To: silug-discuss@silug.org
Sender: silug-discuss-owner@silug.org
User-Agent: Evolution 3.42.4 (3.42.4-1.fc35)

By comparison, Steve's original linkdups (which I use every day) takes
almost eight hours to process 1,800,000 files totalling 9 GB. Most of
the time is consumed walking the directory tree(s) computing one file at
a time. If you are deduping a repository that changes maybe 5-10% per
day, Steve's new version will already have 90-95% unchanged from the
previous day/run. This is a good example of don't work hard, work smart.

--Doc

ByOn Wed, 2022-03-09 at 22:19 -0600, Steven Pritchard wrote:
> A note for Doc (and anyone else who might be vaguely interested)...
> 
> After many, many, *many* years of non-development, I *finally* have a
> working version of my linkdups script (basically my version of
> hardlink(1)) that moves most of the logic to SQL queries against a
> sqlite database.
> 
>   https://github.com/silug/linkdups/tree/sqlite
> 
> The idea here is that if you enable the persistent cache (with the -c
> option), a lot of the checksums required to find duplicate files only
> have to be done once, which should save a lot of time on subsequent
> runs.
> 
> Now that I have it working, I need to put some effort into making it
> work quickly.  At the moment it's pretty slow, but it should
> eventually
> be as fast or faster than the old version with some work.
> 
> -
> To unsubscribe, send email to majordomo@silug.org with
> "unsubscribe silug-discuss" in the body.

-
To unsubscribe, send email to majordomo@silug.org with
"unsubscribe silug-discuss" in the body.

References:
- linkdups
  - From: Steven Pritchard <steve@silug.org>

Prev by Date: Mailman SMTP Relay
Next by Date: Re: Mailman SMTP Relay
Prev by thread: linkdups
Next by thread: Re: linkdups
Index(es):
- Date
- Thread