Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Comparison Strategies

rsdedup supports three strategies for determining whether files are duplicates. Choose with --compare <METHOD>.

size-hash (default)

The default strategy uses a multi-stage pipeline for best performance:

  1. Size grouping — files with unique sizes are immediately excluded (they can’t be duplicates)
  2. Partial hash — hash only the first 4KB of each file; files with unique partial hashes are excluded
  3. Full hash — hash the entire file for remaining candidates

This avoids reading entire files when a quick check can rule out matches. For most workloads, the vast majority of files are eliminated in stages 1 and 2.

rsdedup dedup report --compare size-hash  # default, same as omitting

hash

Skip the partial hash stage and compute the full hash for all files in each size group.

This is simpler but slower for large files where the first 4KB would have been enough to distinguish them.

rsdedup dedup report --compare hash

byte-for-byte

Compare files byte-by-byte without hashing. This guarantees zero false positives (no hash collisions possible) but is slower because every pair of candidate files must be read and compared.

rsdedup dedup report --compare byte-for-byte

Which should I use?

StrategySpeedFalse positivesBest for
size-hashFastestTheoretically possible (cryptographic hash)General use
hashFastTheoretically possibleWhen files differ early and late but not in the first 4KB
byte-for-byteSlowestZeroWhen absolute certainty is required

For virtually all practical use cases, size-hash is the right choice.