Comparison Strategies

rsdedup supports three strategies for determining whether files are duplicates. Choose with --compare <METHOD>.

size-hash (default)

The default strategy uses a multi-stage pipeline for best performance:

Size grouping — files with unique sizes are immediately excluded (they can’t be duplicates)
Partial hash — hash only the first 4KB of each file; files with unique partial hashes are excluded
Full hash — hash the entire file for remaining candidates

This avoids reading entire files when a quick check can rule out matches. For most workloads, the vast majority of files are eliminated in stages 1 and 2.

rsdedup dedup report --compare size-hash  # default, same as omitting

hash

Skip the partial hash stage and compute the full hash for all files in each size group.

This is simpler but slower for large files where the first 4KB would have been enough to distinguish them.

rsdedup dedup report --compare hash

byte-for-byte

Compare files byte-by-byte without hashing. This guarantees zero false positives (no hash collisions possible) but is slower because every pair of candidate files must be read and compared.

rsdedup dedup report --compare byte-for-byte

Which should I use?

Strategy	Speed	False positives	Best for
`size-hash`	Fastest	Theoretically possible (cryptographic hash)	General use
`hash`	Fast	Theoretically possible	When files differ early and late but not in the first 4KB
`byte-for-byte`	Slowest	Zero	When absolute certainty is required

For virtually all practical use cases, size-hash is the right choice.

Keyboard shortcuts

rsdedup - A Fast File Deduplication Tool

Comparison Strategies

size-hash (default)

hash

byte-for-byte

Which should I use?