rclone dedupe


rclone dedupe

Interactively find duplicate files and delete/rename them.

Synopsis

By default dedupe interactively finds duplicate files and offers to delete all but one or rename them to be different. Only useful with Google Drive which can have duplicate file names.

In the first pass it will merge directories with the same name. It will do this iteratively until all the identical directories have been merged.

The dedupe command will delete all but one of any identical (same md5sum) files it finds without confirmation. This means that for most duplicated files the dedupe command will not be interactive. You can use --dry-run to see what would happen without doing anything.

Here is an example run.

Before - with duplicates

$ rclone lsl drive:dupes
  6048320 2016-03-05 16:23:16.798000000 one.txt
  6048320 2016-03-05 16:23:11.775000000 one.txt
   564374 2016-03-05 16:23:06.731000000 one.txt
  6048320 2016-03-05 16:18:26.092000000 one.txt
  6048320 2016-03-05 16:22:46.185000000 two.txt
  1744073 2016-03-05 16:22:38.104000000 two.txt
   564374 2016-03-05 16:22:52.118000000 two.txt

Now the dedupe session

$ rclone dedupe drive:dupes
2016/03/05 16:24:37 Google drive root 'dupes': Looking for duplicates using interactive mode.
one.txt: Found 4 duplicates - deleting identical copies
one.txt: Deleting 2/3 identical duplicates (md5sum "1eedaa9fe86fd4b8632e2ac549403b36")
one.txt: 2 duplicates remain
  1:      6048320 bytes, 2016-03-05 16:23:16.798000000, md5sum 1eedaa9fe86fd4b8632e2ac549403b36
  2:       564374 bytes, 2016-03-05 16:23:06.731000000, md5sum 7594e7dc9fc28f727c42ee3e0749de81
s) Skip and do nothing
k) Keep just one (choose which in next step)
r) Rename all to be different (by changing file.jpg to file-1.jpg)
s/k/r> k
Enter the number of the file to keep> 1
one.txt: Deleted 1 extra copies
two.txt: Found 3 duplicates - deleting identical copies
two.txt: 3 duplicates remain
  1:       564374 bytes, 2016-03-05 16:22:52.118000000, md5sum 7594e7dc9fc28f727c42ee3e0749de81
  2:      6048320 bytes, 2016-03-05 16:22:46.185000000, md5sum 1eedaa9fe86fd4b8632e2ac549403b36
  3:      1744073 bytes, 2016-03-05 16:22:38.104000000, md5sum 851957f7fb6f0bc4ce76be966d336802
s) Skip and do nothing
k) Keep just one (choose which in next step)
r) Rename all to be different (by changing file.jpg to file-1.jpg)
s/k/r> r
two-1.txt: renamed from: two.txt
two-2.txt: renamed from: two.txt
two-3.txt: renamed from: two.txt

The result being

$ rclone lsl drive:dupes
  6048320 2016-03-05 16:23:16.798000000 one.txt
   564374 2016-03-05 16:22:52.118000000 two-1.txt
  6048320 2016-03-05 16:22:46.185000000 two-2.txt
  1744073 2016-03-05 16:22:38.104000000 two-3.txt

Dedupe can be run non interactively using the --dedupe-mode flag or by using an extra parameter with the same value

  • --dedupe-mode interactive - interactive as above.
  • --dedupe-mode skip - removes identical files then skips anything left.
  • --dedupe-mode first - removes identical files then keeps the first one.
  • --dedupe-mode newest - removes identical files then keeps the newest one.
  • --dedupe-mode oldest - removes identical files then keeps the oldest one.
  • --dedupe-mode largest - removes identical files then keeps the largest one.
  • --dedupe-mode rename - removes identical files then renames the rest to be different.

For example to rename all the identically named photos in your Google Photos directory, do

rclone dedupe --dedupe-mode rename "drive:Google Photos"

Or

rclone dedupe rename "drive:Google Photos"
rclone dedupe [mode] remote:path [flags]

Options

      --dedupe-mode string   Dedupe mode interactive|skip|first|newest|oldest|rename. (default "interactive")
  -h, --help                 help for dedupe

Options inherited from parent commands

      --acd-auth-url string                        Auth server URL.
      --acd-client-id string                       Amazon Application Client ID.
      --acd-client-secret string                   Amazon Application Client Secret.
      --acd-templink-threshold SizeSuffix          Files >= this size will be downloaded via their tempLink. (default 9G)
      --acd-token-url string                       Token server url.
      --acd-upload-wait-per-gb Duration            Additional time per GB to wait after a failed complete upload to see if it appears. (default 3m0s)
      --alias-remote string                        Remote or path to alias.
      --ask-password                               Allow prompt for password for encrypted configuration. (default true)
      --auto-confirm                               If enabled, do not request console confirmation.
      --azureblob-access-tier string               Access tier of blob: hot, cool or archive.
      --azureblob-account string                   Storage Account Name (leave blank to use connection string or SAS URL)
      --azureblob-chunk-size SizeSuffix            Upload chunk size (<= 100MB). (default 4M)
      --azureblob-endpoint string                  Endpoint for the service
      --azureblob-key string                       Storage Account Key (leave blank to use connection string or SAS URL)
      --azureblob-list-chunk int                   Size of blob list. (default 5000)
      --azureblob-sas-url string                   SAS URL for container level access only
      --azureblob-upload-cutoff SizeSuffix         Cutoff for switching to chunked upload (<= 256MB). (default 256M)
      --b2-account string                          Account ID or Application Key ID
      --b2-chunk-size SizeSuffix                   Upload chunk size. Must fit in memory. (default 96M)
      --b2-endpoint string                         Endpoint for the service.
      --b2-hard-delete                             Permanently delete files on remote removal, otherwise hide files.
      --b2-key string                              Application Key
      --b2-test-mode string                        A flag string for X-Bz-Test-Mode header for debugging.
      --b2-upload-cutoff SizeSuffix                Cutoff for switching to chunked upload. (default 200M)
      --b2-versions                                Include old versions in directory listings.
      --backup-dir string                          Make backups into hierarchy based in DIR.
      --bind string                                Local address to bind to for outgoing connections, IPv4, IPv6 or name.
      --box-client-id string                       Box App Client Id.
      --box-client-secret string                   Box App Client Secret
      --box-commit-retries int                     Max number of times to try committing a multipart file. (default 100)
      --box-upload-cutoff SizeSuffix               Cutoff for switching to multipart upload (>= 50MB). (default 50M)
      --buffer-size int                            In memory buffer size when reading files for each --transfer. (default 16M)
      --bwlimit BwTimetable                        Bandwidth limit in kBytes/s, or use suffix b|k|M|G or a full timetable.
      --cache-chunk-clean-interval Duration        How often should the cache perform cleanups of the chunk storage. (default 1m0s)
      --cache-chunk-no-memory                      Disable the in-memory cache for storing chunks during streaming.
      --cache-chunk-path string                    Directory to cache chunk files. (default "/home/ncw/.cache/rclone/cache-backend")
      --cache-chunk-size SizeSuffix                The size of a chunk (partial file data). (default 5M)
      --cache-chunk-total-size SizeSuffix          The total size that the chunks can take up on the local disk. (default 10G)
      --cache-db-path string                       Directory to store file structure metadata DB. (default "/home/ncw/.cache/rclone/cache-backend")
      --cache-db-purge                             Clear all the cached data for this remote on start.
      --cache-db-wait-time Duration                How long to wait for the DB to be available - 0 is unlimited (default 1s)
      --cache-dir string                           Directory rclone will use for caching. (default "/home/ncw/.cache/rclone")
      --cache-info-age Duration                    How long to cache file structure information (directory listings, file size, times etc). (default 6h0m0s)
      --cache-plex-insecure string                 Skip all certificate verifications when connecting to the Plex server
      --cache-plex-password string                 The password of the Plex user
      --cache-plex-url string                      The URL of the Plex server
      --cache-plex-username string                 The username of the Plex user
      --cache-read-retries int                     How many times to retry a read from a cache storage. (default 10)
      --cache-remote string                        Remote to cache.
      --cache-rps int                              Limits the number of requests per second to the source FS (-1 to disable) (default -1)
      --cache-tmp-upload-path string               Directory to keep temporary files until they are uploaded.
      --cache-tmp-wait-time Duration               How long should files be stored in local cache before being uploaded (default 15s)
      --cache-workers int                          How many workers should run in parallel to download chunks. (default 4)
      --cache-writes                               Cache file data on writes through the FS
      --checkers int                               Number of checkers to run in parallel. (default 8)
  -c, --checksum                                   Skip based on checksum & size, not mod-time & size
      --config string                              Config file. (default "/home/ncw/.rclone.conf")
      --contimeout duration                        Connect timeout (default 1m0s)
  -L, --copy-links                                 Follow symlinks and copy the pointed to item.
      --cpuprofile string                          Write cpu profile to file
      --crypt-directory-name-encryption            Option to either encrypt directory names or leave them intact. (default true)
      --crypt-filename-encryption string           How to encrypt the filenames. (default "standard")
      --crypt-password string                      Password or pass phrase for encryption.
      --crypt-password2 string                     Password or pass phrase for salt. Optional but recommended.
      --crypt-remote string                        Remote to encrypt/decrypt.
      --crypt-show-mapping                         For all files listed show how the names encrypt.
      --delete-after                               When synchronizing, delete files on destination after transfering (default)
      --delete-before                              When synchronizing, delete files on destination before transfering
      --delete-during                              When synchronizing, delete files during transfer
      --delete-excluded                            Delete files on dest excluded from sync
      --disable string                             Disable a comma separated list of features.  Use help to see a list.
      --drive-acknowledge-abuse                    Set to allow files which return cannotDownloadAbusiveFile to be downloaded.
      --drive-allow-import-name-change             Allow the filetype to change when uploading Google docs (e.g. file.doc to file.docx). This will confuse sync and reupload every time.
      --drive-alternate-export                     Use alternate export URLs for google documents export.,
      --drive-auth-owner-only                      Only consider files owned by the authenticated user.
      --drive-chunk-size SizeSuffix                Upload chunk size. Must a power of 2 >= 256k. (default 8M)
      --drive-client-id string                     Google Application Client Id
      --drive-client-secret string                 Google Application Client Secret
      --drive-export-formats string                Comma separated list of preferred formats for downloading Google docs. (default "docx,xlsx,pptx,svg")
      --drive-formats string                       Deprecated: see export_formats
      --drive-impersonate string                   Impersonate this user when using a service account.
      --drive-import-formats string                Comma separated list of preferred formats for uploading Google docs.
      --drive-keep-revision-forever                Keep new head revision of each file forever.
      --drive-list-chunk int                       Size of listing chunk 100-1000. 0 to disable. (default 1000)
      --drive-root-folder-id string                ID of the root folder
      --drive-scope string                         Scope that rclone should use when requesting access from drive.
      --drive-service-account-credentials string   Service Account Credentials JSON blob
      --drive-service-account-file string          Service Account Credentials JSON file path
      --drive-shared-with-me                       Only show files that are shared with me.
      --drive-skip-gdocs                           Skip google documents in all listings.
      --drive-team-drive string                    ID of the Team Drive
      --drive-trashed-only                         Only show files that are in the trash.
      --drive-upload-cutoff SizeSuffix             Cutoff for switching to chunked upload (default 8M)
      --drive-use-created-date                     Use file created date instead of modified date.,
      --drive-use-trash                            Send files to the trash instead of deleting permanently. (default true)
      --drive-v2-download-min-size SizeSuffix      If Object's are greater, use drive v2 API to download. (default off)
      --dropbox-chunk-size SizeSuffix              Upload chunk size. (< 150M). (default 48M)
      --dropbox-client-id string                   Dropbox App Client Id
      --dropbox-client-secret string               Dropbox App Client Secret
  -n, --dry-run                                    Do a trial run with no permanent changes
      --dump string                                List of items to dump from: headers,bodies,requests,responses,auth,filters,goroutines,openfiles
      --dump-bodies                                Dump HTTP headers and bodies - may contain sensitive info
      --dump-headers                               Dump HTTP bodies - may contain sensitive info
      --exclude stringArray                        Exclude files matching pattern
      --exclude-from stringArray                   Read exclude patterns from file
      --exclude-if-present string                  Exclude directories if filename is present
      --fast-list                                  Use recursive list if available. Uses more memory but fewer transactions.
      --files-from stringArray                     Read list of source-file names from file
  -f, --filter stringArray                         Add a file-filtering rule
      --filter-from stringArray                    Read filtering patterns from a file
      --ftp-host string                            FTP host to connect to
      --ftp-pass string                            FTP password
      --ftp-port string                            FTP port, leave blank to use default (21)
      --ftp-user string                            FTP username, leave blank for current username, ncw
      --gcs-bucket-acl string                      Access Control List for new buckets.
      --gcs-client-id string                       Google Application Client Id
      --gcs-client-secret string                   Google Application Client Secret
      --gcs-location string                        Location for the newly created buckets.
      --gcs-object-acl string                      Access Control List for new objects.
      --gcs-project-number string                  Project number.
      --gcs-service-account-file string            Service Account Credentials JSON file path
      --gcs-storage-class string                   The storage class to use when storing objects in Google Cloud Storage.
      --http-url string                            URL of http host to connect to
      --hubic-chunk-size SizeSuffix                Above this size files will be chunked into a _segments container. (default 5G)
      --hubic-client-id string                     Hubic Client Id
      --hubic-client-secret string                 Hubic Client Secret
      --ignore-checksum                            Skip post copy check of checksums.
      --ignore-errors                              delete even if there are I/O errors
      --ignore-existing                            Skip all files that exist on destination
      --ignore-size                                Ignore size when skipping use mod-time or checksum.
  -I, --ignore-times                               Don't skip files that match size and time - transfer all files
      --immutable                                  Do not modify files. Fail if existing files have been modified.
      --include stringArray                        Include files matching pattern
      --include-from stringArray                   Read include patterns from file
      --jottacloud-hard-delete                     Delete files permanently rather than putting them into the trash.
      --jottacloud-md5-memory-limit SizeSuffix     Files bigger than this will be cached on disk to calculate the MD5 if required. (default 10M)
      --jottacloud-mountpoint string               The mountpoint to use.
      --jottacloud-pass string                     Password.
      --jottacloud-unlink                          Remove existing public link to file/folder with link command rather than creating.
      --jottacloud-user string                     User Name
      --local-no-check-updated                     Don't check to see if the files change during upload
      --local-no-unicode-normalization             Don't apply unicode normalization to paths and filenames (Deprecated)
      --local-nounc string                         Disable UNC (long path names) conversion on Windows
      --log-file string                            Log everything to this file
      --log-format string                          Comma separated list of log format options (default "date,time")
      --log-level string                           Log level DEBUG|INFO|NOTICE|ERROR (default "NOTICE")
      --low-level-retries int                      Number of low level retries to do. (default 10)
      --max-age duration                           Only transfer files younger than this in s or suffix ms|s|m|h|d|w|M|y (default off)
      --max-backlog int                            Maximum number of objects in sync or check backlog. (default 10000)
      --max-delete int                             When synchronizing, limit the number of deletes (default -1)
      --max-depth int                              If set limits the recursion depth to this. (default -1)
      --max-size int                               Only transfer files smaller than this in k or suffix b|k|M|G (default off)
      --max-transfer int                           Maximum size of data to transfer. (default off)
      --mega-debug                                 Output more debug from Mega.
      --mega-hard-delete                           Delete files permanently rather than putting them into the trash.
      --mega-pass string                           Password.
      --mega-user string                           User name
      --memprofile string                          Write memory profile to file
      --min-age duration                           Only transfer files older than this in s or suffix ms|s|m|h|d|w|M|y (default off)
      --min-size int                               Only transfer files bigger than this in k or suffix b|k|M|G (default off)
      --modify-window duration                     Max time diff to be considered the same (default 1ns)
      --no-check-certificate                       Do not verify the server SSL certificate. Insecure.
      --no-gzip-encoding                           Don't set Accept-Encoding: gzip.
      --no-traverse                                Obsolete - does nothing.
      --no-update-modtime                          Don't update destination mod-time if files identical.
  -x, --one-file-system                            Don't cross filesystem boundaries (unix/macOS only).
      --onedrive-chunk-size SizeSuffix             Chunk size to upload files with - must be multiple of 320k. (default 10M)
      --onedrive-client-id string                  Microsoft App Client Id
      --onedrive-client-secret string              Microsoft App Client Secret
      --onedrive-drive-id string                   The ID of the drive to use
      --onedrive-drive-type string                 The type of the drive ( personal | business | documentLibrary )
      --onedrive-expose-onenote-files              Set to make OneNote files show up in directory listings.
      --opendrive-password string                  Password.
      --opendrive-username string                  Username
      --pcloud-client-id string                    Pcloud App Client Id
      --pcloud-client-secret string                Pcloud App Client Secret
  -P, --progress                                   Show progress during transfer.
      --qingstor-access-key-id string              QingStor Access Key ID
      --qingstor-connection-retries int            Number of connnection retries. (default 3)
      --qingstor-endpoint string                   Enter a endpoint URL to connection QingStor API.
      --qingstor-env-auth                          Get QingStor credentials from runtime. Only applies if access_key_id and secret_access_key is blank.
      --qingstor-secret-access-key string          QingStor Secret Access Key (password)
      --qingstor-zone string                       Zone to connect to.
  -q, --quiet                                      Print as little stuff as possible
      --rc                                         Enable the remote control server.
      --rc-addr string                             IPaddress:Port or :Port to bind server to. (default "localhost:5572")
      --rc-cert string                             SSL PEM key (concatenation of certificate and CA certificate)
      --rc-client-ca string                        Client certificate authority to verify clients with
      --rc-htpasswd string                         htpasswd file - if not provided no authentication is done
      --rc-key string                              SSL PEM Private key
      --rc-max-header-bytes int                    Maximum size of request header (default 4096)
      --rc-pass string                             Password for authentication.
      --rc-realm string                            realm for authentication (default "rclone")
      --rc-server-read-timeout duration            Timeout for server reading data (default 1h0m0s)
      --rc-server-write-timeout duration           Timeout for server writing data (default 1h0m0s)
      --rc-user string                             User name for authentication.
      --retries int                                Retry operations this many times if they fail (default 3)
      --retries-sleep duration                     Interval between retrying operations if they fail, e.g 500ms, 60s, 5m. (0 to disable)
      --s3-access-key-id string                    AWS Access Key ID.
      --s3-acl string                              Canned ACL used when creating buckets and/or storing objects in S3.
      --s3-chunk-size SizeSuffix                   Chunk size to use for uploading. (default 5M)
      --s3-disable-checksum                        Don't store MD5 checksum with object metadata
      --s3-endpoint string                         Endpoint for S3 API.
      --s3-env-auth                                Get AWS credentials from runtime (environment variables or EC2/ECS meta data if no env vars).
      --s3-force-path-style                        If true use path style access if false use virtual hosted style. (default true)
      --s3-location-constraint string              Location constraint - must be set to match the Region.
      --s3-provider string                         Choose your S3 provider.
      --s3-region string                           Region to connect to.
      --s3-secret-access-key string                AWS Secret Access Key (password)
      --s3-server-side-encryption string           The server-side encryption algorithm used when storing this object in S3.
      --s3-session-token string                    An AWS session token
      --s3-sse-kms-key-id string                   If using KMS ID you must provide the ARN of Key.
      --s3-storage-class string                    The storage class to use when storing new objects in S3.
      --s3-upload-concurrency int                  Concurrency for multipart uploads. (default 2)
      --s3-v2-auth                                 If true use v2 authentication.
      --sftp-ask-password                          Allow asking for SFTP password when needed.
      --sftp-disable-hashcheck                     Disable the execution of SSH commands to determine if remote file hashing is available.
      --sftp-host string                           SSH host to connect to
      --sftp-key-file string                       Path to unencrypted PEM-encoded private key file, leave blank to use ssh-agent.
      --sftp-pass string                           SSH password, leave blank to use ssh-agent.
      --sftp-path-override string                  Override path used by SSH connection.
      --sftp-port string                           SSH port, leave blank to use default (22)
      --sftp-set-modtime                           Set the modified time on the remote if set. (default true)
      --sftp-use-insecure-cipher                   Enable the use of the aes128-cbc cipher. This cipher is insecure and may allow plaintext data to be recovered by an attacker.
      --sftp-user string                           SSH username, leave blank for current username, ncw
      --size-only                                  Skip based on size only, not mod-time or checksum
      --skip-links                                 Don't warn about skipped symlinks.
      --stats duration                             Interval between printing stats, e.g 500ms, 60s, 5m. (0 to disable) (default 1m0s)
      --stats-file-name-length int                 Max file name length in stats. 0 for no limit (default 40)
      --stats-log-level string                     Log level to show --stats output DEBUG|INFO|NOTICE|ERROR (default "INFO")
      --stats-one-line                             Make the stats fit on one line.
      --stats-unit string                          Show data rate in stats as either 'bits' or 'bytes'/s (default "bytes")
      --streaming-upload-cutoff int                Cutoff for switching to chunked upload if file size is unknown. Upload starts after reaching cutoff or when file ends. (default 100k)
      --suffix string                              Suffix for use with --backup-dir.
      --swift-auth string                          Authentication URL for server (OS_AUTH_URL).
      --swift-auth-token string                    Auth Token from alternate authentication - optional (OS_AUTH_TOKEN)
      --swift-auth-version int                     AuthVersion - optional - set to (1,2,3) if your auth URL has no version (ST_AUTH_VERSION)
      --swift-chunk-size SizeSuffix                Above this size files will be chunked into a _segments container. (default 5G)
      --swift-domain string                        User domain - optional (v3 auth) (OS_USER_DOMAIN_NAME)
      --swift-endpoint-type string                 Endpoint type to choose from the service catalogue (OS_ENDPOINT_TYPE) (default "public")
      --swift-env-auth                             Get swift credentials from environment variables in standard OpenStack form.
      --swift-key string                           API key or password (OS_PASSWORD).
      --swift-region string                        Region name - optional (OS_REGION_NAME)
      --swift-storage-policy string                The storage policy to use when creating a new container
      --swift-storage-url string                   Storage URL - optional (OS_STORAGE_URL)
      --swift-tenant string                        Tenant name - optional for v1 auth, this or tenant_id required otherwise (OS_TENANT_NAME or OS_PROJECT_NAME)
      --swift-tenant-domain string                 Tenant domain - optional (v3 auth) (OS_PROJECT_DOMAIN_NAME)
      --swift-tenant-id string                     Tenant ID - optional for v1 auth, this or tenant required otherwise (OS_TENANT_ID)
      --swift-user string                          User name to log in (OS_USERNAME).
      --swift-user-id string                       User ID to log in - optional - most swift systems use user and leave this blank (v3 auth) (OS_USER_ID).
      --syslog                                     Use Syslog for logging
      --syslog-facility string                     Facility for syslog, eg KERN,USER,... (default "DAEMON")
      --timeout duration                           IO idle timeout (default 5m0s)
      --tpslimit float                             Limit HTTP transactions per second to this.
      --tpslimit-burst int                         Max burst of transactions for --tpslimit. (default 1)
      --track-renames                              When synchronizing, track file renames and do a server side move if possible
      --transfers int                              Number of file transfers to run in parallel. (default 4)
      --union-remotes string                       List of space separated remotes.
  -u, --update                                     Skip files that are newer on the destination.
      --use-server-modtime                         Use server modified time instead of object metadata
      --user-agent string                          Set the user-agent to a specified string. The default is rclone/ version (default "rclone/v1.44")
  -v, --verbose count                              Print lots more stuff (repeat for more)
      --webdav-bearer-token string                 Bearer token instead of user/pass (eg a Macaroon)
      --webdav-pass string                         Password.
      --webdav-url string                          URL of http host to connect to
      --webdav-user string                         User name
      --webdav-vendor string                       Name of the Webdav site/service/software you are using
      --yandex-client-id string                    Yandex Client Id
      --yandex-client-secret string                Yandex Client Secret

SEE ALSO

  • rclone - Show help for rclone commands, flags and backends.
Auto generated by spf13/cobra on 15-Oct-2018

Share and Enjoy.