Hasher is a special overlay backend to create remotes which handle checksums for other remotes. It's main functions include:
To use Hasher, first set up the underlying remote following the configuration instructions for that remote. You can also use a local pathname instead of a remote. Check that your base remote is working.
Let's call the base remote myRemote:path
here. Note that anything inside
myRemote:path
will be handled by hasher and anything outside won't.
This means that if you are using a bucket based remote (S3, B2, Swift)
then you should put the bucket in the remote s3:bucket
.
Now proceed to interactive or manual configuration.
Run rclone config
:
No remotes found, make a new one?
n) New remote
s) Set configuration password
q) Quit config
n/s/q> n
name> Hasher1
Type of storage to configure.
Choose a number from below, or type in your own value
[snip]
XX / Handle checksums for other remotes
\ "hasher"
[snip]
Storage> hasher
Remote to cache checksums for, like myremote:mypath.
Enter a string value. Press Enter for the default ("").
remote> myRemote:path
Comma separated list of supported checksum types.
Enter a string value. Press Enter for the default ("md5,sha1").
hashsums> md5
Maximum time to keep checksums in cache. 0 = no cache, off = cache forever.
max_age> off
Edit advanced config? (y/n)
y) Yes
n) No
y/n> n
Remote config
--------------------
[Hasher1]
type = hasher
remote = myRemote:path
hashsums = md5
max_age = off
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
Run rclone config path
to see the path of current active config file,
usually YOURHOME/.config/rclone/rclone.conf
.
Open it in your favorite text editor, find section for the base remote
and create new section for hasher like in the following examples:
[Hasher1]
type = hasher
remote = myRemote:path
hashes = md5
max_age = off
[Hasher2]
type = hasher
remote = /local/path
hashes = dropbox,sha1
max_age = 24h
Hasher takes basically the following parameters:
remote
is required,hashes
is a comma separated list of supported checksums
(by default md5,sha1
),max_age
- maximum time to keep a checksum value in the cache,
0
will disable caching completely,
off
will cache "forever" (that is until the files get changed).Make sure the remote
has :
(colon) in. If you specify the remote without
a colon then rclone will use a local directory of that name. So if you use
a remote of /local/path
then rclone will handle hashes for that directory.
If you use remote = name
literally then rclone will put files
in a directory called name
located under current directory.
Now you can use it as Hasher2:subdir/file
instead of base remote.
Hasher will transparently update cache with new checksums when a file
is fully read or overwritten, like:
rclone copy External:path/file Hasher:dest/path
rclone cat Hasher:path/to/file > /dev/null
The way to refresh all cached checksums (even unsupported by the base backend)
for a subtree is to re-download all files in the subtree. For example,
use hashsum --download
using any supported hashsum on the command line
(we just care to re-read):
rclone hashsum MD5 --download Hasher:path/to/subtree > /dev/null
rclone backend dump Hasher:path/to/subtree
You can print or drop hashsum cache using custom backend commands:
rclone backend dump Hasher:dir/subdir
rclone backend drop Hasher:
Hasher supports two backend commands: generic SUM file import
and faster
but less consistent stickyimport
.
rclone backend import Hasher:dir/subdir SHA1 /path/to/SHA1SUM [--checkers 4]
Instead of SHA1 it can be any hash supported by the remote. The last argument
can point to either a local or an other-remote:path
text file in SUM format.
The command will parse the SUM file, then walk down the path given by the
first argument, snapshot current fingerprints and fill in the cache entries
correspondingly.
hasher:dir/subdir
.--checkers
to make it faster. Or use stickyimport
if you don't care
about fingerprints and consistency.rclone backend stickyimport hasher:path/to/data sha1 remote:/path/to/sum.sha1
stickyimport
is similar to import
but works much faster because it
does not need to stat existing files and skips initial tree walk.
Instead of binding cache entries to file fingerprints it creates sticky
entries bound to the file name alone ignoring size, modification time etc.
Such hash entries can be replaced only by purge
, delete
, backend drop
or by full re-read/re-write of the files.
Here are the Standard options specific to hasher (Better checksums for other remotes).
Remote to cache checksums for (e.g. myRemote:path).
Properties:
Comma separated list of supported checksum types.
Properties:
Maximum time to keep checksums in cache (0 = no cache, off = cache forever).
Properties:
Here are the Advanced options specific to hasher (Better checksums for other remotes).
Auto-update checksum for files smaller than this size (disabled by default).
Properties:
Description of the remote.
Properties:
Any metadata supported by the underlying remote is read and written.
See the metadata docs for more info.
Here are the commands specific to the hasher backend.
Run them with
rclone backend COMMAND remote:
The help below will explain what arguments each command takes.
See the backend command for more info on how to pass options and arguments.
These can be run on a running backend using the rc command backend/command.
Drop cache
rclone backend drop remote: [options] [<arguments>+]
Completely drop checksum cache. Usage Example: rclone backend drop hasher:
Dump the database
rclone backend dump remote: [options] [<arguments>+]
Dump cache records covered by the current remote
Full dump of the database
rclone backend fulldump remote: [options] [<arguments>+]
Dump all cache records in the database
Import a SUM file
rclone backend import remote: [options] [<arguments>+]
Amend hash cache from a SUM file and bind checksums to files by size/time. Usage Example: rclone backend import hasher:subdir md5 /path/to/sum.md5
Perform fast import of a SUM file
rclone backend stickyimport remote: [options] [<arguments>+]
Fill hash cache from a SUM file without verifying file fingerprints. Usage Example: rclone backend stickyimport hasher:subdir md5 remote:path/to/sum.md5
This section explains how various rclone operations work on a hasher remote.
Disclaimer. This section describes current implementation which can change in future rclone versions!.
The rclone hashsum
(or md5sum
or sha1sum
) command will:
auto_size
then download object and calculate
requested hashes on the fly.fingerprint
(including size, modtime if supported, first-found other hash if any).hashsum
abovemove
will update keys of existing cache entriesdeletefile
will remove a single cache entrypurge
will remove all cache entries under the purged pathNote that setting max_age = 0
will disable checksum caching completely.
If you set max_age = off
, checksums in cache will never age, unless you
fully rewrite or delete the file.
Cached checksums are stored as bolt
database files under rclone cache
directory, usually ~/.cache/rclone/kv/
. Databases are maintained
one per base backend, named like BaseRemote~hasher.bolt
.
Checksums for multiple alias
-es into a single base backend
will be stored in the single database. All local paths are treated as
aliases into the local
backend (unless encrypted or chunked) and stored
in ~/.cache/rclone/kv/local~hasher.bolt
.
Databases can be shared between multiple rclone processes.