RSConstruct - Rust Build Tool

A fast, incremental build tool written in Rust with C/C++ compilation, template support, Python linting, and parallel execution.

Features

Incremental builds using SHA-256 checksums to detect changes
C/C++ compilation with automatic header dependency tracking
Parallel execution of independent build products with -j flag
Template processing via the Tera templating engine
Python linting with ruff and pylint
Documentation spell checking using hunspell dictionaries
Make integration — run make in directories containing Makefiles
.gitignore support — respects .gitignore and .rsconstructignore patterns
Deterministic builds — same input always produces same build order
Graceful interrupt — Ctrl+C saves progress, next build resumes where it left off
Config-aware caching — changing compiler flags or linter config triggers rebuilds
Convention over configuration — simple naming conventions, minimal config needed

Philosophy

Convention over configuration — simple naming conventions, explicit config loading, incremental builds by default.

Nomenclature

This page defines the terminology used throughout RSConstruct’s code, configuration, CLI, and documentation.

Core concepts

Term	Definition
pname	Processor name. The type name of a processor as registered by its plugin (e.g., `ruff`, `pip`, `tera`, `creator`). Unique across all plugins. Used in `[processor.PNAME]` config sections and in `processors defconfig PNAME`.
iname	Instance name. The name of a specific processor instance as declared in `rsconstruct.toml`. For single-instance processors, the iname equals the pname (e.g., `[processor.ruff]` → iname is `ruff`). For multi-instance processors, the iname is the sub-key (e.g., `[processor.creator.venv]` → iname is `creator.venv`). Used in `processors config INAME`.
processor	A configured instance that discovers products and executes builds. Created from a plugin + TOML config. Immutable after creation.
plugin	A factory registered at compile time via `inventory::submit!`. Knows how to create processors from TOML config. Has a pname, a processor type, and config metadata.
product	A single build unit with inputs, outputs, and a processor. The atomic unit of incremental building.
processor type	One of four categories: `checker`, `generator`, `creator`, `explicit`. Determines how inputs are discovered, how outputs are declared, and how results are cached. See Processor Types.
analyzer	A dependency scanner that runs after product discovery to add extra input edges to existing products (e.g., the `cpp` analyzer adds every `#include`d header as an extra input of a C/C++ product). Analyzers never create products of their own. Declared with `[analyzer.NAME]` sections in `rsconstruct.toml`. Unlike processors, only analyzers explicitly declared in config run — there is no “auto-enable” default. See Dependency Analyzers.
analyzer plugin	A factory registered at compile time via `inventory::submit!` in the analyzer registry. Knows how to construct an analyzer from its `[analyzer.NAME]` TOML section. Each plugin declares its name, description, and whether it is `native` (pure Rust) or external (may invoke subprocesses).
native analyzer	An analyzer whose default configuration runs entirely in-process (no subprocesses). Example: `icpp` uses a pure-Rust regex scanner for `#include` directives. Some native analyzers become external in non-default configurations (e.g., `icpp` with `pkg_config` set shells out to `pkg-config` for include paths).
external analyzer	An analyzer that shells out to another program to do its work. Example: `cpp` always runs `gcc -MM` for exact compiler-accurate header scanning.

Configuration

Term	Definition
output_files	List of individual output files declared in creator/explicit config. Cached as blobs.
output_dirs	List of output directories declared in creator/explicit config. All files inside are walked and cached as a tree.
src_dirs	Directories to scan for input files.
src_extensions	File extensions to match during scanning.
dep_inputs	Extra files that trigger a rebuild when their content changes.
dep_auto	Config files silently added as dep_inputs when they exist on disk (e.g., `.eslintrc`).

Cache

Term	Definition
blob	A file’s raw content stored in the object store, addressed by SHA-256 hash. Blobs have no path — the consumer knows where to restore them.
tree	A serialized list of `(path, mode, blob_checksum)` entries describing a set of output files. Stored in the descriptor store.
marker	A zero-byte descriptor indicating a checker passed. Its presence is the cached result.
descriptor	A cache entry (blob reference, tree, or marker) stored in `.rsconstruct/descriptors/`, keyed by the descriptor key.
descriptor key	A content-addressed hash of `(pname, config_hash, variant, input_checksum)`. Changes when processor config or input content changes. Does NOT include file paths — renaming a file with identical content produces the same key.
input checksum	Combined SHA-256 hash of all input file contents for a product.

Build pipeline

Term	Definition
discover	Phase where processors scan the file index and register products in the build graph.
classify	Phase where each product is classified as skip, restore, or build based on its cache state.
execute	Phase where products are built in dependency order.
anchor file	A file whose presence triggers a creator processor to run (e.g., `Cargo.toml` for cargo, `requirements.txt` for pip).

CLI conventions

Command	Name parameter	Meaning
`processors defconfig PNAME`	pname	Processor type name — shows factory defaults
`processors config [INAME]`	iname	Instance name from config — shows resolved config
`processors files [INAME]`	iname	Instance name from config — shows discovered files
`analyzers defconfig [NAME]`	analyzer name	Analyzer name from the analyzer registry — shows factory defaults
`analyzers config [NAME]`	analyzer name	Analyzer name as declared in `[analyzer.NAME]` — shows resolved config

Installation

Download pre-built binary (Linux)

Pre-built binaries are available for x86_64 and aarch64 (arm64).

Using the GitHub CLI:

# x86_64
gh release download latest --repo veltzer/rsconstruct --pattern 'rsconstruct-x86_64-unknown-linux-gnu' --output rsconstruct --clobber

# aarch64 / arm64
gh release download latest --repo veltzer/rsconstruct --pattern 'rsconstruct-aarch64-unknown-linux-gnu' --output rsconstruct --clobber

chmod +x rsconstruct
sudo mv rsconstruct /usr/local/bin/

Or with curl:

# x86_64
curl -Lo rsconstruct https://github.com/veltzer/rsconstruct/releases/download/latest/rsconstruct-x86_64-unknown-linux-gnu

# aarch64 / arm64
curl -Lo rsconstruct https://github.com/veltzer/rsconstruct/releases/download/latest/rsconstruct-aarch64-unknown-linux-gnu

chmod +x rsconstruct
sudo mv rsconstruct /usr/local/bin/

Install from crates.io

cargo install rsconstruct

This downloads, compiles, and installs the latest published version into ~/.cargo/bin/.

Build from source

cargo build --release

The binary will be at target/release/rsconstruct.

Release profile

The release build is configured in Cargo.toml for maximum performance with a small binary:

[profile.release]
strip = true        # Remove debug symbols
lto = true          # Link-time optimization across all crates
codegen-units = 1   # Single codegen unit for better optimization

For an even smaller binary at the cost of some runtime speed, add opt-level = "z" (optimize for size) or opt-level = "s" (balance size and speed).

Getting Started

This guide walks through setting up an rsconstruct project for the two primary supported languages: Python and C++.

Python

Prerequisites

rsconstruct installed (Installation)
ruff on PATH

Setup

Create a project directory and configuration:

mkdir myproject && cd myproject

# rsconstruct.toml
[processor.ruff]

Create a Python source file:

mkdir -p src

# src/hello.py
def greet(name: str) -> str:
    return f"Hello, {name}!"

if __name__ == "__main__":
    print(greet("world"))

Run the build:

rsconstruct build

Expected output:

Processing ruff (1 product)
  hello.py

Run again — nothing has changed, so rsconstruct skips the check:

Processing ruff (1 product)
  Up to date

Adding pylint

Install pylint and add a section for it:

# rsconstruct.toml
[processor.ruff]

[processor.pylint]

Pass extra arguments via processor config:

[processor.pylint]
args = ["--disable=C0114,C0115,C0116"]

Adding zspell for docs

If your project has markdown documentation, add a section for the zspell processor:

[processor.ruff]

[processor.pylint]

[processor.zspell]

Create a .zspell-words file in the project root with any custom words (one per line) that the zspeller should accept.

C++

Prerequisites

rsconstruct installed (Installation)
gcc/g++ on PATH

Setup

Create a project directory and configuration:

mkdir myproject && cd myproject

# rsconstruct.toml
[processor.cc_single_file]

Create a source file under src/:

mkdir -p src

// src/hello.c
#include <stdio.h>

int main() {
    printf("Hello, world!\n");
    return 0;
}

Run the build:

rsconstruct build

Expected output:

Processing cc_single_file (1 product)
  hello.elf

The compiled executable is at out/cc_single_file/hello.elf.

Run again — the source hasn’t changed, so rsconstruct restores from cache:

Processing cc_single_file (1 product)
  Up to date

Customizing compiler flags

Pass flags via processor config:

[processor.cc_single_file]
cflags = ["-Wall", "-Wextra", "-O2"]
cxxflags = ["-Wall", "-Wextra", "-O2"]
include_paths = ["include"]

See the CC Single File processor docs for the full configuration reference.

Adding static analysis

Install cppcheck and add a section for it:

[processor.cc_single_file]

[processor.cppcheck]

Both processors run on the same source files — rsconstruct handles them independently.

Next Steps

Commands — full list of rsconstruct commands
Configuration — all configuration options
Processors — detailed docs for each processor

Binary Releases

RSConstruct publishes pre-built binaries as GitHub releases when a version tag (v*) is pushed.

Supported Platforms

Platform	Binary name
Linux x86_64	`rsconstruct-linux-x86_64`
Linux aarch64 (arm64)	`rsconstruct-linux-aarch64`
macOS x86_64	`rsconstruct-macos-x86_64`
macOS aarch64 (Apple Silicon)	`rsconstruct-macos-aarch64`
Windows x86_64	`rsconstruct-windows-x86_64.exe`

How It Works

The release workflow (.github/workflows/release.yml) has two jobs:

build — a matrix job that builds the release binary for each platform and uploads it as a GitHub Actions artifact.
release — waits for all builds to finish, downloads the artifacts, and creates a GitHub release with auto-generated release notes and all binaries attached.

Creating a Release

Update version in Cargo.toml
Commit and push
Tag and push: git tag v0.2.2 && git push origin v0.2.2
The workflow creates the GitHub release automatically

Release Profile

The binary is optimized for size and performance:

[profile.release]
strip = true        # Remove debug symbols
lto = true          # Link-time optimization across all crates
codegen-units = 1   # Single codegen unit for better optimization

Command Reference

Global Flags

These flags can be used with any command:

Flag	Description
`--verbose`, `-v`	Show skip/restore/cache messages during build
`--output-display`, `-O`	What to show for output files (`none`, `basename`, `path`; default: `none`)
`--input-display`, `-I`	What to show for input files (`none`, `source`, `all`; default: `source`)
`--path-format`, `-P`	Path format for displayed files (`basename`, `path`; default: `path`)
`--show-child-processes`	Print each child process command before execution
`--show-output`	Show tool output even on success (default: only show on failure)
`--json`	Output in JSON Lines format (machine-readable)
`--quiet`, `-q`	Suppress all output except errors (useful for CI)
`--phases`	Show build phase messages (discover, add_dependencies, etc.)

Example:

rsconstruct --phases build                    # Show phase messages during build
rsconstruct --show-child-processes build      # Show each command being executed
rsconstruct --show-output build               # Show compiler/linter output even on success
rsconstruct --phases --show-child-processes build # Show both phases and commands
rsconstruct -O path build                     # Show output file paths in build messages
rsconstruct -I all build                      # Show all input files (including headers)

`rsconstruct build`

Requires config. (no subcommands)

Incremental build — only rebuilds products whose inputs have changed.

rsconstruct build                              # Incremental build
rsconstruct build --force                      # Force full rebuild
rsconstruct build -j4                          # Build with 4 parallel jobs
rsconstruct build --dry-run                    # Show what would be built without executing
rsconstruct build --keep-going                 # Continue after errors
rsconstruct build --timings                    # Show per-product and total timing info
rsconstruct build --stop-after discover        # Stop after product discovery
rsconstruct build --stop-after add-dependencies # Stop after dependency scanning
rsconstruct build --stop-after resolve         # Stop after graph resolution
rsconstruct build --stop-after classify        # Stop after classifying products
rsconstruct build --show-output                # Show compiler/linter output even on success
rsconstruct build --auto-add-words             # Add misspelled words to .zspell-words instead of failing
rsconstruct build --auto-add-words -p zspell   # Run only zspell and auto-add words
rsconstruct build -p ruff,pylint               # Run only specific processors
rsconstruct build --explain                    # Show why each product is skipped/restored/rebuilt
rsconstruct build --retry 3                    # Retry failed products up to 3 times
rsconstruct build --no-mtime                   # Disable mtime pre-check, always compute checksums
rsconstruct build --no-summary                 # Suppress the build summary
rsconstruct build --batch-size 10              # Limit batch size for batch-capable processors
rsconstruct build --verify-tool-versions       # Verify tool versions against .tools.versions
rsconstruct build -t "src/*.c"                 # Only build products matching this glob pattern
rsconstruct build -d src                       # Only build products under this directory
rsconstruct build --show-all-config-changes    # Show all config changes, not just output-affecting

By default, tool output (compiler messages, linter output) is only shown when a command fails. Use --show-output to see all output.

Incremental recovery and batch behavior

By default (fail-fast mode), rsconstruct executes each product independently, even for batch-capable processors. Successfully completed products are cached immediately, so if a build fails or is interrupted, the next run only rebuilds what wasn’t completed.

With --keep-going, batch-capable processors group all their products into a single tool invocation. If the tool fails, all products in the batch are marked failed and must be rebuilt. Use --batch-size N to limit batch chunks and improve recovery granularity.

Processor Shortcuts (`@` aliases)

The -p flag supports @-prefixed shortcuts that expand to groups of processors:

By type:

@checkers — all checker processors (ruff, pylint, shellcheck, etc.)
@generators — all generator processors (tera, cc_single_file, etc.)
@creators — all creator processors (pip, npm, cargo, etc.)

By tool:

@python3 — all processors that require python3
@node — all processors that require node
Any tool name works (matched against each processor’s required_tools())

By processor name:

@ruff — equivalent to ruff (strips the @ prefix)

Examples:

rsconstruct build -p @checkers              # Run only checker processors
rsconstruct build -p @generators            # Run only generator processors
rsconstruct build -p @python3               # Run all Python-based processors
rsconstruct build -p @checkers,tera         # Mix shortcuts with processor names

The --stop-after flag allows stopping the build at a specific phase:

discover — stop after discovering products (before dependency scanning)
add-dependencies — stop after adding dependencies (before resolving graph)
resolve — stop after resolving the dependency graph (before execution)
classify — stop after classifying products (show skip/restore/build counts)
build — run the full build (default)

`rsconstruct clean`

Clean build artifacts. When run without a subcommand, removes build output files (same as rsconstruct clean outputs).

Subcommand	Config required?
`outputs`	Yes
`all`	Yes
`git`	Yes
`unknown`	Yes

rsconstruct clean                # Remove build output files (preserves cache) [default]
rsconstruct clean outputs        # Remove build output files (preserves cache)
rsconstruct clean all            # Remove out/ and .rsconstruct/ directories
rsconstruct clean git            # Hard clean using git clean -qffxd (requires git repository)
rsconstruct clean unknown        # Remove files not tracked by git and not known as build outputs
rsconstruct clean unknown --dry-run      # Show what would be removed without deleting
rsconstruct clean unknown --no-gitignore # Include gitignored files as unknown

`rsconstruct status`

Requires config. (no subcommands)

Show product status — whether each product is up-to-date, stale, or restorable from cache.

rsconstruct status                     # Show per-processor and total summary
rsconstruct status -v                  # Show per-product status
rsconstruct status --breakdown         # Show source file counts by processor and extension

`rsconstruct smart auto`

Auto-detect relevant processors and add them to rsconstruct.toml. Scans the project for files matching each processor’s conventions and checks that the required tools are installed. Only adds new sections — existing processor sections are preserved. Requires config.

rsconstruct smart auto

Example output:

Added 3 processor(s): pylint, ruff, shellcheck

`rsconstruct init`

No config needed. (no subcommands)

Initialize a new rsconstruct project in the current directory.

rsconstruct init

`rsconstruct watch`

Requires config. (no subcommands)

Watch source files and auto-rebuild on changes.

rsconstruct watch                              # Watch and rebuild on changes
rsconstruct watch --auto-add-words             # Watch with zspell auto-add words
rsconstruct watch -j4                          # Watch with 4 parallel jobs
rsconstruct watch -p ruff                      # Watch and only run the ruff processor

The watch command accepts the same build flags as rsconstruct build (e.g., --jobs, --keep-going, --timings, --processors, --batch-size, --explain, --retry, --no-mtime, --no-summary).

`rsconstruct graph`

Display the build dependency graph.

Subcommand	Config required?
`show`	Yes
`view`	Yes
`stats`	Yes

rsconstruct graph show                    # Default SVG format
rsconstruct graph show --format dot       # Graphviz DOT format
rsconstruct graph show --format mermaid   # Mermaid format
rsconstruct graph show --format json      # JSON format
rsconstruct graph show --format text      # Plain text hierarchical view
rsconstruct graph show --format svg       # SVG format (requires Graphviz dot)
rsconstruct graph view                    # Open as SVG (default viewer)
rsconstruct graph view --viewer mermaid   # Open as HTML with Mermaid in browser
rsconstruct graph view --viewer svg       # Generate and open SVG using Graphviz dot
rsconstruct graph stats                   # Show graph statistics (products, processors, dependencies)

`rsconstruct cache`

Manage the build cache.

Subcommand	Config required?
`clear`	No
`size`	Yes
`trim`	Yes
`list`	Yes
`stale`	Yes
`stats`	Yes
`remove-stale`	Yes

rsconstruct cache clear         # Clear the entire cache
rsconstruct cache size          # Show cache size
rsconstruct cache trim          # Remove unreferenced objects
rsconstruct cache list          # List all cache entries and their status
rsconstruct cache stale         # Show which cache entries are stale vs current
rsconstruct cache stats         # Show per-processor cache statistics
rsconstruct cache remove-stale  # Remove stale index entries not matching any current product

`rsconstruct webcache`

Manage the web request cache. Schemas fetched by iyamlschema (and any future processors that fetch URLs) are cached in .rsconstruct/webcache.redb.

Subcommand	Config required?
`clear`	No
`stats`	No
`list`	No

rsconstruct webcache clear   # Clear all cached web responses
rsconstruct webcache stats   # Show cache size and entry count
rsconstruct webcache list    # List all cached URLs and their sizes

`rsconstruct deps`

Show or manage source file dependencies from the dependency cache. The cache is populated during builds when dependency analyzers scan source files (e.g., C/C++ headers, Python imports).

Subcommand	Config required?
`list`	No
`used`	Yes
`build`	Yes
`config`	Yes
`show`	Yes
`stats`	Yes
`clean`	Yes

rsconstruct deps list                          # List all available dependency analyzers
rsconstruct deps build                         # Run dependency analysis without building
rsconstruct deps show all                    # Show all cached dependencies
rsconstruct deps show files src/main.c       # Show dependencies for a specific file
rsconstruct deps show files src/a.c src/b.c  # Show dependencies for multiple files
rsconstruct deps show analyzers cpp          # Show dependencies from the C/C++ analyzer
rsconstruct deps show analyzers cpp python   # Show dependencies from multiple analyzers
rsconstruct deps stats                       # Show statistics by analyzer
rsconstruct deps clean                       # Clear the entire dependency cache
rsconstruct deps clean --analyzer cpp        # Clear only C/C++ dependencies
rsconstruct deps clean --analyzer python     # Clear only Python dependencies

Example output for rsconstruct deps show all:

src/main.c: [cpp] (no dependencies)
src/test.c: [cpp]
  src/utils.h
  src/config.h
config/settings.py: [python]
  config/base.py

Example output for rsconstruct deps stats:

cpp: 15 files, 42 dependencies
python: 8 files, 12 dependencies

Total: 23 files, 54 dependencies

Note: This command reads directly from the dependency cache (.rsconstruct/deps.redb). If the cache is empty, run a build first to populate it.

This command is useful for:

Debugging why a file is being rebuilt
Understanding the include/import structure of your project
Verifying that dependency analyzers are finding the right files
Viewing statistics about cached dependencies by analyzer
Clearing dependencies for a specific analyzer without affecting others

`rsconstruct smart`

Smart config manipulation commands for managing processor sections in rsconstruct.toml.

Subcommand	Config required?
`disable-all`	No
`enable-all`	No
`enable`	No
`disable`	No
`only`	No
`reset`	No
`enable-detected`	Yes
`enable-if-available`	Yes
`minimal`	Yes
`auto`	Yes
`remove-no-file-processors`	Yes

rsconstruct smart enable pylint          # Add [processor.pylint] section
rsconstruct smart disable pylint         # Remove [processor.pylint] section
rsconstruct smart enable-all             # Add sections for all builtin processors
rsconstruct smart disable-all            # Remove all processor sections
rsconstruct smart enable-detected        # Add sections for auto-detected processors
rsconstruct smart enable-if-available    # Add sections for detected processors with tools installed
rsconstruct smart minimal                # Remove all, then add only detected processors
rsconstruct smart only ruff pylint       # Remove all, then add only listed processors
rsconstruct smart reset                  # Remove all processor sections
rsconstruct smart remove-no-file-processors  # Remove processors that don't match any files

`rsconstruct processors`

Subcommand	Config required?
`list --all`	No
`list`	Yes (without `--all`)
`defconfig`	No
`config`	Uses config if available
`used`	Yes
`files`	Yes
`allowlist`	Yes
`graph`	Yes

rsconstruct processors list              # List declared processors and descriptions
rsconstruct processors list -a           # Show all built-in processors
rsconstruct processors files             # Show source and target files for each declared processor
rsconstruct processors files ruff        # Show files for a specific processor
rsconstruct processors files              # Show files for enabled processors
rsconstruct processors config ruff       # Show resolved configuration for a processor
rsconstruct processors config --diff     # Show only fields that differ from defaults
rsconstruct processors defconfig ruff    # Show default configuration for a processor
rsconstruct processors add ruff          # Append [processor.ruff] to rsconstruct.toml (fields pre-populated + comments)
rsconstruct processors add ruff --dry-run  # Preview the snippet without writing
rsconstruct processors allowlist         # Show the current processor allowlist
rsconstruct processors graph             # Show inter-processor dependencies
rsconstruct processors graph --format dot    # Graphviz DOT format
rsconstruct processors graph --format mermaid # Mermaid format
rsconstruct processors files --headers   # Show files with processor headers

`rsconstruct tools`

List or check external tools required by declared processors. All subcommands use config if available; without config, they operate on all built-in processors.

Subcommand	Config required?
`list`	Uses config if available
`check`	Uses config if available
`lock`	Uses config if available
`install`	Uses config if available
`install-deps`	Uses config if available
`stats`	Uses config if available
`graph`	Uses config if available

rsconstruct tools list              # List required tools and which processor needs them
rsconstruct tools list -a           # Include tools from disabled processors
rsconstruct tools check             # Verify tool versions against .tools.versions lock file
rsconstruct tools lock              # Lock tool versions to .tools.versions
rsconstruct tools install           # Install all missing external tools
rsconstruct tools install ruff      # Install a specific tool by name
rsconstruct tools install -y        # Skip confirmation prompt
rsconstruct tools install-deps      # Install declared [dependencies] (pip, npm, gem)
rsconstruct tools install-deps -y   # Skip confirmation prompt
rsconstruct tools stats             # Show tool availability and language runtime breakdown
rsconstruct tools stats --json      # Show tool stats in JSON format
rsconstruct tools graph             # Show tool-to-processor dependency graph (DOT format)
rsconstruct tools graph --format mermaid  # Mermaid format
rsconstruct tools graph --view      # Open tool graph in browser

`rsconstruct tags`

Search and query frontmatter tags from markdown files.

Subcommand	Config required?
`list`	Yes
`count`	Yes
`tree`	Yes
`stats`	Yes
`files`	Yes
`grep`	Yes
`for-file`	Yes
`frontmatter`	Yes
`unused`	Yes
`validate`	Yes
`matrix`	Yes
`coverage`	Yes
`orphans`	Yes
`check`	Yes
`suggest`	Yes
`merge`	Yes
`collect`	Yes

rsconstruct tags list                        # List all unique tags
rsconstruct tags count                       # Show each tag with file count, sorted by frequency
rsconstruct tags tree                        # Show tags grouped by prefix/category
rsconstruct tags stats                       # Show statistics about the tags database
rsconstruct tags files docker                # List files matching a tag (AND semantics)
rsconstruct tags files docker --or k8s       # List files matching any tag (OR semantics)
rsconstruct tags files level:advanced        # Match key:value tags
rsconstruct tags grep deploy                 # Search for tags containing a substring
rsconstruct tags grep deploy -i              # Case-insensitive tag search
rsconstruct tags for-file src/main.md        # List all tags for a specific file
rsconstruct tags frontmatter src/main.md     # Show raw frontmatter for a file
rsconstruct tags validate                    # Validate tags against tags_dir allowlist
rsconstruct tags unused                      # List tags in tags_dir not used by any file
rsconstruct tags unused --strict             # Exit with error if unused tags found (CI)
rsconstruct tags check                       # Run all tag validations without building
rsconstruct tags suggest src/main.md         # Suggest tags for a file based on similarity
rsconstruct tags coverage                    # Show percentage of files with each tag category
rsconstruct tags matrix                      # Show coverage matrix of tag categories per file
rsconstruct tags orphans                     # Find markdown files with no tags
rsconstruct tags merge ../other/tags         # Merge tags from another project
rsconstruct tags collect                     # Add missing tags from source files to tag collection

`rsconstruct complete`

Generate shell completions. No config needed when shell is specified as argument; uses config to read default shells if no argument given.

rsconstruct complete bash    # Generate bash completions
rsconstruct complete zsh     # Generate zsh completions
rsconstruct complete fish    # Generate fish completions

`rsconstruct terms`

Manage term checking and fixing in markdown files.

Subcommand	Config required?
`fix`	Yes
`merge`	Yes
`stats`	Yes

`rsconstruct terms fix`

Add backticks around terms from the terms directory that appear unquoted in markdown files.

rsconstruct terms fix
rsconstruct terms fix --remove-non-terms   # also remove backticks from non-terms

`rsconstruct terms merge`

Merge terms from another project’s terms directory. Unions matching files and copies missing files in both directions.

rsconstruct terms merge ../other-project/terms

`rsconstruct doctor`

Requires config. (no subcommands)

Diagnose build environment — checks config, tools, and versions.

rsconstruct doctor

`rsconstruct info`

Show project information.

Subcommand	Config required?
`source`	Yes

rsconstruct info source          # Show source file counts by extension

`rsconstruct sloc`

No config needed. (no subcommands)

Count source lines of code (SLOC) by language, with optional COCOMO effort/cost estimation.

rsconstruct sloc                 # Show SLOC by language
rsconstruct sloc --cocomo        # Include COCOMO effort/cost estimation
rsconstruct sloc --cocomo --salary 80000  # Custom annual salary for COCOMO

`rsconstruct version`

No config needed. (no subcommands)

Print version information.

rsconstruct version

Shell Completions

RSConstruct generates shell completion scripts that provide tab-completion for commands, subcommands, flags, and argument values.

Generating Completions

# Generate for the default shell (configured in rsconstruct.toml)
rsconstruct complete

# Generate for a specific shell
rsconstruct complete bash
rsconstruct complete zsh
rsconstruct complete fish

To install, source the output in your shell profile:

# Bash (~/.bashrc)
eval "$(rsconstruct complete bash)"

# Zsh (~/.zshrc)
eval "$(rsconstruct complete zsh)"

# Fish (~/.config/fish/config.fish)
rsconstruct complete fish | source

Configuration

The default shell(s) for rsconstruct complete (with no argument) are configured in rsconstruct.toml:

[completions]
shells = ["bash"]

What Gets Completed

Commands and subcommands

All top-level commands (build, processors, analyzers, config, etc.) and their subcommands complete automatically via clap.

Processor type names (pnames)

These commands complete with processor type names from the plugin registry (e.g., ruff, pylint, cc_single_file):

rsconstruct processors defconfig <TAB>
rsconstruct build --processors <TAB> / rsconstruct build -p <TAB>
rsconstruct watch --processors <TAB> / rsconstruct watch -p <TAB>

The list is drawn from the plugin registry at compile time.

Processor instance names (inames)

These commands complete with instance names declared in the current project’s rsconstruct.toml (e.g., pylint, pylint.tests, cc_single_file):

rsconstruct processors config <TAB>
rsconstruct processors files <TAB>

Instance names are extracted from [processor.NAME] and [processor.NAME.SUBNAME] headings in rsconstruct.toml at tab-completion time. Requires a project config in the current directory. Bash only.

Analyzer names

These commands complete analyzer names (cpp, markdown, python, tera):

rsconstruct analyzers config <TAB>
rsconstruct analyzers clean --analyzer <TAB>

Analyzer names are specified via clap’s value_parser attribute, so they work in all shells without post-processing.

Flags and options

All --flags and -f short flags complete in all shells via clap’s built-in generation.

Implementation

Completions are generated by clap_complete in src/cli.rs. Two mechanisms provide argument-value completions:

1. clap `value_parser` (preferred)

For arguments with a small, fixed set of values, use #[arg(value_parser = [...])] on the field. This works in all shells automatically because clap embeds the values in the generated script.

Example from AnalyzersAction::Config:

#![allow(unused)]
fn main() {
#[arg(value_parser = ["cpp", "markdown", "python", "tera"])]
name: Option<String>,
}

2. Bash post-processing (processor names)

Processor names are not known to clap at derive time because they come from the inventory plugin registry. The function inject_bash_processor_completions() post-processes the generated bash script to inject processor names into the opts variable for specific command sections.

This only works for bash. Other shells get the base clap completions without processor name injection.

The targets for injection are identified by their case labels in the generated bash script:

rsconstruct__processors__config)
rsconstruct__processors__defconfig)
rsconstruct__processors__files)

The function also patches --processors / -p flag completions in build and watch commands to suggest processor names instead of file paths.

Adding Completions for New Arguments

Fixed set of values (analyzer names, enum variants): Use #[arg(value_parser = [...])]. Works in all shells.
Dynamic set from registry (processor names): Add the case label to inject_bash_processor_completions() targets. Only works in bash.
Enum types: Use #[arg(value_enum)] on a clap-derived enum. Works in all shells.

Configuration

RSConstruct is configured via an rsconstruct.toml file in the project root.

Full reference

[build]
parallel = 1          # Number of parallel jobs (1 = sequential, 0 = auto-detect CPU cores)
                      # Also settable via RSCONSTRUCT_THREADS env var (CLI -j takes precedence)
batch_size = 0        # Max files per batch for batch-capable processors (0 = no limit, omit to disable)
output_dir = "out"    # Global output directory prefix for generator processors

# Declare processors by adding [processor.NAME] sections.
# Only declared processors run — no processors are enabled by default.
# Use `rsconstruct smart auto` to auto-detect and add relevant processors.

[processor.ruff]
# args = []

[processor.pylint]
# args = ["--disable=C0114"]

[processor.cc_single_file]
# cc = "gcc"
# cflags = ["-Wall", "-O2"]

[vars]
my_excludes = ["/vendor/", "/third_party/"]  # Define variables for reuse with ${var_name}

[cache]
restore_method = "auto"  # auto (default: copy in CI, hardlink otherwise), hardlink, or copy
compression = false      # Compress cached objects with zstd (requires restore_method = "copy")
remote = "s3://my-bucket/rsconstruct-cache"  # Optional: remote cache URL
remote_push = true       # Push local builds to remote (default: true)
remote_pull = true       # Pull from remote cache on cache miss (default: true)
mtime_check = true       # Use mtime pre-check to skip unchanged file checksums (default: true)

[analyzer]
auto_detect = true
enabled = ["cpp", "python"]

[graph]
viewer = "google-chrome"  # Command to open graph files (default: platform-specific)

[plugins]
dir = "plugins"  # Directory containing .lua processor plugins

[completions]
shells = ["bash"]

[dependencies]
pip = ["pyyaml", "jinja2"]    # Python packages
npm = ["eslint", "prettier"]  # Node.js packages
gem = ["mdl"]                 # Ruby gems
system = ["pandoc", "graphviz"]  # System packages (checked but not auto-installed)

Per-processor configuration is documented on each processor’s page under Processors. Lua plugin configuration is documented under Lua Plugins.

Processor instances

Processors are declared by adding a [processor.NAME] section to rsconstruct.toml. An empty section enables the processor with default settings:

[processor.pylint]

Customize with config fields:

[processor.pylint]
args = ["--disable=C0114,C0116"]
src_dirs = ["src"]

Remove the section to disable the processor.

Multiple instances

Run the same processor multiple times with different configurations by adding named sub-sections:

[processor.pylint.core]
src_dirs = ["src/core"]
args = ["--disable=C0114"]

[processor.pylint.tests]
src_dirs = ["tests"]
args = ["--disable=C0114,C0116"]

Each instance runs independently with its own config and cache.

You cannot mix single-instance and multi-instance formats for the same processor type — use either [processor.pylint] or [processor.pylint.NAME], not both.

Instance naming

A single instance declared as [processor.pylint] has the instance name pylint. Named instances declared as [processor.pylint.core] and [processor.pylint.tests] have instance names pylint.core and pylint.tests.

The instance name is used everywhere a processor is identified:

Build output and progress: [pylint.core] src/core/main.py
Error messages: Error: [pylint.tests] tests/test_foo.py: ...
Build statistics: each instance reports its own file counts and durations
Cache keys: instances have separate caches, so changing one config does not invalidate the other
Output directories: generator processors default to out/{instance_name} (e.g., out/marp.slides and out/marp.docs for two marp instances), ensuring outputs do not collide
The --processors filter: use the full instance name, e.g., rsconstruct build -p pylint.core

For single instances, the instance name equals the processor type name (e.g., pylint), so there is no visible difference from previous behavior.

Auto-detection

Run rsconstruct smart auto to scan the project and automatically add [processor.NAME] sections for all processors whose files are detected and whose tools are installed. It does not remove existing sections.

Variable substitution

Define variables in a [vars] section and reference them using ${var_name} syntax:

[vars]
kernel_excludes = ["/kernel/", "/kernel_standalone/", "/examples_standalone/"]

[processor.cppcheck]
src_exclude_dirs = "${kernel_excludes}"

[processor.cc_single_file]
src_exclude_dirs = "${kernel_excludes}"

Variables are substituted before TOML parsing. The "${var_name}" (including quotes) is replaced with the TOML-serialized value, preserving types (arrays stay arrays, strings stay strings). Undefined variable references produce an error.

Section details

`[build]`

Key	Type	Default	Description
`parallel`	integer	`1`	Number of parallel jobs. `1` = sequential, `0` = auto-detect CPU cores. Can also be set via the `RSCONSTRUCT_THREADS` environment variable (CLI `-j` takes precedence).
`batch_size`	integer	`0`	Maximum files per batch for batch-capable processors. `0` = no limit (all files in one batch). Omit to disable batching entirely.
`output_dir`	string	`"out"`	Global output directory prefix. Processor `output_dir` defaults that start with `out/` are remapped to use this prefix (e.g., setting `"build"` changes `out/marp` to `build/marp`). Individual processors can still override their `output_dir` explicitly.

`[processor.NAME]`

Each [processor.NAME] section declares a processor instance. The section name must match a builtin processor type (e.g., ruff, pylint, cc_single_file) or a Lua plugin name.

Common fields available to all processors:

Key	Type	Default	Description
`args`	array of strings	`[]`	Extra command-line arguments passed to the tool.
`dep_inputs`	array of strings	`[]`	Additional input files that trigger rebuild when changed.
`dep_auto`	array of strings	varies	Config files auto-detected as inputs (e.g., `.pylintrc`).
`batch`	boolean	`true`	Whether to batch multiple files into a single tool invocation. Note: in fail-fast mode (default), chunk size is 1 regardless of this setting — batch mode only groups files with `--keep-going` or `--batch-size`. For external tools, a batch failure marks all products in the chunk as failed. Internal processors (`i`-prefixed) return per-file results, so partial failure is handled correctly.
`max_jobs`	integer	none	Maximum concurrent jobs for this processor. When set, limits how many instances of this processor run in parallel, regardless of the global `-j` setting. Useful for heavyweight processors (e.g., `marp` spawns Chromium). Omit to use the global parallelism.
`src_dirs`	array of strings	varies	Directories to scan for source files. Required for most processors (defaults to `[]`). Processors with a specific default (e.g., `tera` defaults to `"tera.templates"`, `cc_single_file` defaults to `"src"`) do not require this. Not required when `src_files` is set. Use `rsconstruct processors defconfig <name>` to see a processor’s defaults.
`src_extensions`	array of strings	varies	File extensions to match.
`src_exclude_dirs`	array of strings	varies	Directory path segments to exclude from scanning.
`src_exclude_files`	array of strings	`[]`	File names to exclude.
`src_exclude_paths`	array of strings	`[]`	Paths (relative to project root) to exclude.
`src_files`	array of strings	`[]`	When non-empty, only these exact paths are matched — `src_dirs`, `src_extensions`, and exclude filters are bypassed. Useful for processors that operate on specific files rather than scanning directories.

Processor-specific fields are documented on each processor’s page under Processors.

`[cache]`

Key	Type	Default	Description
`restore_method`	string	`"auto"`	How to restore cached outputs. `"auto"` (default) uses `"copy"` in CI environments (`CI=true`) and `"hardlink"` otherwise. `"hardlink"` is faster but requires same filesystem; `"copy"` works everywhere.
`compression`	boolean	`false`	Compress cached objects with zstd. Incompatible with `restore_method = "hardlink"` — requires `"copy"`.
`remote`	string	none	Remote cache URL. See Remote Caching.
`remote_push`	boolean	`true`	Push locally built artifacts to remote cache.
`remote_pull`	boolean	`true`	Pull from remote cache on local cache miss.
`mtime_check`	boolean	`true`	Persist file checksums across builds using an mtime database. Set to `false` in CI/CD environments where the cache won’t survive the build and the write overhead isn’t worth it. Can also be disabled via `--no-mtime-cache` flag. See Checksum Cache.

`[analyzer]`

Key	Type	Default	Description
`auto_detect`	boolean	`true`	When `true`, only run enabled analyzers that auto-detect relevant files.
`enabled`	array of strings	`["cpp", "python"]`	List of dependency analyzers to enable.

`[graph]`

Key	Type	Default	Description
`viewer`	string	platform-specific	Command to open graph files

`[plugins]`

Key	Type	Default	Description
`dir`	string	`"plugins"`	Directory containing `.lua` processor plugins

`[completions]`

Key	Type	Default	Description
`shells`	array	`["bash"]`	Shells to generate completions for

`[dependencies]`

Declare project dependencies by package manager. Used by rsconstruct doctor to verify availability and rsconstruct tools install-deps to install missing packages.

Key	Type	Default	Description
`pip`	array of strings	`[]`	Python packages to install via `pip install`. Supports version specifiers (e.g., `"ruff>=0.4"`).
`npm`	array of strings	`[]`	Node.js packages to install via `npm install`.
`gem`	array of strings	`[]`	Ruby gems to install via `gem install`.
`system`	array of strings	`[]`	System packages installed via the detected package manager (`apt-get`, `dnf`, `pacman`, or `brew`).

Remote Caching

RSConstruct supports sharing build artifacts across machines via remote caching. When enabled, build outputs are pushed to a remote store and can be pulled by other machines, avoiding redundant rebuilds.

Configuration

Add a remote URL to your [cache] section in rsconstruct.toml:

[cache]
remote = "s3://my-bucket/rsconstruct-cache"

Supported Backends

Amazon S3

[cache]
remote = "s3://bucket-name/optional/prefix"

Requires:

AWS CLI installed (aws command)
AWS credentials configured (~/.aws/credentials or environment variables)

The S3 backend uses aws s3 cp and aws s3 ls commands.

HTTP/HTTPS

[cache]
remote = "http://cache-server.example.com:8080/rsconstruct"
# or
remote = "https://cache-server.example.com/rsconstruct"

Requires:

curl command
Server that supports GET and PUT requests

The HTTP backend expects:

GET /path to return the object or 404
PUT /path to store the object
HEAD /path to check existence (returns 200 or 404)

Local Filesystem

[cache]
remote = "file:///shared/cache/rsconstruct"

Useful for:

Network-mounted filesystems (NFS, CIFS)
Testing remote cache behavior locally

Control Options

You can control push and pull separately:

[cache]
remote = "s3://my-bucket/rsconstruct-cache"
remote_push = true   # Push local builds to remote (default: true)
remote_pull = true   # Pull from remote on cache miss (default: true)

Pull-only mode

To share a read-only cache (e.g., from CI):

[cache]
remote = "s3://ci-cache/rsconstruct"
remote_push = false
remote_pull = true

Push-only mode

To populate a cache without using it (e.g., in CI):

[cache]
remote = "s3://ci-cache/rsconstruct"
remote_push = true
remote_pull = false

How It Works

Cache Structure

Remote cache stores two types of objects:

Index entries at index/{cache_key}
- JSON mapping input checksums to output checksums
- One entry per product (source file + processor + config)
Objects at objects/{xx}/{rest_of_checksum}
- Content-addressed storage (like git)
- Actual file contents identified by SHA-256

On Build

RSConstruct computes the cache key and input checksum
Checks local cache first
If local miss and remote_pull = true:
- Fetches index entry from remote
- Fetches required objects from remote
- Restores outputs locally
If rebuild required:
- Executes the processor
- Stores outputs in local cache
- If remote_push = true, pushes to remote

Cache Hit Flow

Local cache hit → Restore from local → Done
       ↓ miss
Remote cache hit → Download index + objects → Restore → Done
       ↓ miss
Execute processor → Cache locally → Push to remote → Done

Best Practices

CI/CD Integration

In your CI pipeline:

# .github/workflows/build.yml
env:
  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}

steps:
  - run: rsconstruct build

Separate CI and Developer Caches

Use different prefixes to avoid conflicts:

# CI: rsconstruct.toml.ci
[cache]
remote = "s3://cache/rsconstruct/ci"
remote_push = true
remote_pull = true

# Developers: rsconstruct.toml
[cache]
remote = "s3://cache/rsconstruct/ci"
remote_push = false  # Read from CI cache only
remote_pull = true

Cache Invalidation

Cache entries are keyed by:

Processor name
Source file path
Processor configuration hash

To force a full rebuild ignoring caches:

rsconstruct build --force

To clear only the local cache:

rsconstruct cache clear

Troubleshooting

S3 Access Denied

Check your AWS credentials:

aws s3 ls s3://your-bucket/

HTTP Upload Failures

Ensure your server accepts PUT requests. Many static file servers are read-only.

Slow Remote Cache

Consider:

Using a closer region for S3
Enabling S3 Transfer Acceleration
Using a caching proxy

Debug Mode

Use verbose output to see cache operations:

rsconstruct build -v

This shows which products are restored from local cache, remote cache, or rebuilt.

Project Structure

RSConstruct follows a convention-over-configuration approach. The directory layout determines how files are processed.

Directory layout

project/
├── rsconstruct.toml          # Configuration file
├── .rsconstructignore        # Glob patterns for files to exclude
├── config/           # Python config files (loaded by templates)
├── tera.templates/   # .tera template files
├── templates.mako/   # .mako template files
├── src/              # C/C++ source files
├── plugins/          # Lua processor plugins (.lua files)
├── out/
│   ├── cc_single_file/ # Compiled executables
│   ├── ruff/         # Ruff lint stub files
│   ├── pylint/       # Pylint lint stub files
│   ├── cppcheck/      # C/C++ lint stub files
│   ├── zspell/   # Zspell stub files
│   └── make/         # Make stub files
└── .rsconstruct/             # Cache directory
    ├── index.json    # Cache index
    ├── objects/       # Cached build artifacts
    └── deps/          # Dependency files

Conventions

Templates

Files in tera.templates/ with configured extensions (default .tera) are rendered to the project root:

tera.templates/Makefile.tera produces Makefile
tera.templates/config.toml.tera produces config.toml

Similarly, files in templates.mako/ with .mako extensions are rendered via the Mako processor:

templates.mako/Makefile.mako produces Makefile
templates.mako/config.toml.mako produces config.toml

C/C++ sources

Files in the source directory (default src/) are compiled to executables under out/cc_single_file/, preserving the directory structure:

src/main.c produces out/cc_single_file/main.elf
src/utils/helper.cc produces out/cc_single_file/utils/helper.elf

Python files

Python files are linted and stub outputs are written to out/ruff/ (ruff processor) or out/pylint/ (pylint processor).

Build artifacts

All build outputs go into out/. The cache lives in .rsconstruct/. Use rsconstruct clean to remove out/ (preserving cache) or rsconstruct clean all to remove both.

Dependency Analyzers

rsconstruct uses dependency analyzers to scan source files and discover dependencies between files. Analyzers run after processors discover products and add dependency information to the build graph.

How analyzers work

Product discovery: Processors discover products (source → output mappings).
Dependency analysis: Analyzers scan source files to find dependencies.
Graph resolution: Dependencies are added to products for correct build ordering.

Analyzers are decoupled from processors — they operate on any product with matching source files, regardless of which processor created it.

Built-in analyzers

Per-analyzer reference pages:

cpp — C/C++ #include scanning (invokes gcc/pkg-config)
icpp — C/C++ #include scanning, pure Rust (no subprocess)
python — Python import / from ... import resolution
markdown — Markdown image and link references
tera — Tera {% include %}, {% import %}, {% extends %} references

Configuration

Analyzers are configured in rsconstruct.toml:

[analyzer]
auto_detect = true                                  # default: true
enabled     = ["cpp", "markdown", "python", "tera"] # instances to run

[analyzer.cpp]
include_paths = ["include", "src"]

Only analyzers listed under [analyzer.X] (or enabled) are instantiated — there is no global “all analyzers always run” mode.

Auto-detection

An analyzer runs if:

It is declared (listed in enabled or configured via [analyzer.X]).
AND either auto_detect = false, OR the analyzer detects relevant files in the project.

This mirrors how processors work.

Caching

Analyzer results are cached in the dependency cache (.rsconstruct/deps.redb). On subsequent builds:

If a source file hasn’t changed, its cached dependencies are used.
If a source file has changed, dependencies are re-scanned.
The cache is shared across all analyzers.

Use the analyzers and deps commands to inspect the cache:

rsconstruct analyzers list            # list available analyzers
rsconstruct analyzers defconfig cpp   # show default config for an analyzer
rsconstruct analyzers add cpp         # append [analyzer.cpp] to rsconstruct.toml with comments
rsconstruct analyzers add cpp --dry-run  # preview without writing
rsconstruct deps all                  # show all cached dependencies
rsconstruct deps for src/main.c       # show dependencies for specific files
rsconstruct deps clean                # clear the dependency cache

Build phases

With --phases, you can see when analyzers run:

rsconstruct --phases build

Output:

Phase: Building dependency graph...
  Phase: discover
  Phase: add_dependencies    # Analyzers run here
  Phase: apply_tool_version_hashes
  Phase: resolve_dependencies

Use --stop-after add-dependencies to stop after dependency analysis:

rsconstruct build --stop-after add-dependencies

Adding a custom analyzer

Analyzers implement the DepAnalyzer trait:

#![allow(unused)]
fn main() {
pub trait DepAnalyzer: Sync + Send {
    fn description(&self) -> &str;
    fn auto_detect(&self, file_index: &FileIndex) -> bool;
    fn analyze(
        &self,
        graph: &mut BuildGraph,
        deps_cache: &mut DepsCache,
        file_index: &FileIndex,
        verbose: bool,
    ) -> Result<()>;
}
}

The analyze method should:

Find products with relevant source files.
Scan each source file for dependencies (using the cache when available).
Add discovered dependencies to the product’s inputs.

cpp

Scans C/C++ source files for #include directives and adds header file dependencies to the build graph.

Native: No (may invoke gcc, pkg-config).

Auto-detects: Projects with .c, .cc, .cpp, .cxx, .h, .hh, .hpp, or .hxx files.

Features

Recursive header scanning (follows includes in header files)
Queries compiler for system include paths (only tracks project-local headers)
Handles both #include "file" (relative to source) and #include <file> (searches include paths)
Supports native regex scanning and compiler-based scanning (gcc -MM)
Uses the dependency cache for incremental builds

System header detection

The cpp analyzer queries the compiler for its include search paths using gcc -E -Wp,-v -xc /dev/null. This allows it to properly identify which headers are system headers vs project-local headers. Only headers within the project directory are tracked as dependencies.

Configuration

[analyzer.cpp]
include_scanner       = "native"          # or "compiler" for gcc -MM
include_paths         = ["include", "src"]
pkg_config            = ["gtk+-3.0", "libcurl"]
include_path_commands = ["gcc -print-file-name=plugin"]
src_exclude_dirs      = ["/kernel/", "/vendor/"]
cc                    = "gcc"
cxx                   = "g++"
cflags                = ["-I/usr/local/include"]
cxxflags              = ["-std=c++17"]

`include_path_commands`

Shell commands whose stdout (trimmed) is added to the include search paths. Useful for compiler-specific include directories:

[analyzer.cpp]
include_path_commands = [
    "gcc -print-file-name=plugin",  # GCC plugin development headers
    "llvm-config --includedir",     # LLVM headers
]

`pkg_config` integration

Runs pkg-config --cflags-only-I for each package and adds the resulting include paths to the search path. Useful when your code includes headers from system libraries:

[analyzer.cpp]
pkg_config = ["gtk+-3.0", "glib-2.0"]

This automatically finds headers like <gtk/gtk.h> and <glib.h> without manually specifying their include paths.

icpp

Native (no-subprocess) C/C++ dependency analyzer. Scans #include directives by parsing source files directly in Rust, without invoking gcc or pkg-config.

Native: Yes.

Auto-detects: Projects with .c, .cc, .cpp, .cxx, .h, .hh, .hpp, or .hxx files.

When to use

You want faster analysis without the overhead of launching gcc per file.
You don’t need compiler-driven include path discovery.
You’re happy to enumerate include paths explicitly in rsconstruct.toml.

Prefer cpp if you need compiler-discovered system include paths or pkg-config integration.

Configuration

[analyzer.icpp]
include_paths          = ["include", "src"]
src_exclude_dirs       = ["/kernel/", "/vendor/"]
follow_angle_brackets  = false
skip_not_found         = false

`follow_angle_brackets` (default: `false`)

Controls whether #include <foo.h> directives are followed.

false (default) — angle-bracket includes are skipped entirely. System headers never enter the dependency graph, even when they resolve through configured include paths.
true — angle-bracket includes are resolved and followed the same way as quoted includes. Unresolved angles are still tolerated (not an error), so missing system headers don’t break analysis.

Quoted includes (#include "foo.h") always resolve and must be found — this setting does not affect them (see skip_not_found below).

`skip_not_found` (default: `false`)

Controls what happens when an include cannot be resolved.

false (default) — a quoted include (#include "foo.h") that cannot be resolved is a hard error. Unresolved angle-bracket includes are silently ignored (when follow_angle_brackets = true).
true — unresolved includes of any kind are silently skipped.

Use true for partial / work-in-progress codebases where some headers aren’t generated yet.

python

Scans Python source files for import and from ... import statements and adds dependencies on local Python modules.

Native: Yes.

Auto-detects: Projects with .py files.

Features

Resolves imports to local files (ignores stdlib / external packages)
Supports both import foo and from foo import bar syntax
Searches relative to the source file and project root

Configuration

[analyzer.python]
# currently no tunables

markdown

Scans Markdown source files for image and link references (![alt](path), [text](path)) and adds referenced local files as dependencies.

Native: Yes.

Auto-detects: Projects with .md files.

Features

Extracts ![alt](path) image references and [text](path) link references
Resolves paths relative to the source file’s directory
Skips URLs (http://, https://, ftp://), data URIs, and anchor-only links
Strips title text and anchor fragments from paths

This ensures that when an image or linked file changes, any Markdown product that references it is rebuilt.

Configuration

[analyzer.markdown]
# currently no tunables

tera

Scans Tera template files for {% include %}, {% import %}, and {% extends %} directives and adds referenced template files as dependencies.

Native: Yes.

Auto-detects: Projects with .tera files.

Features

Extracts paths from {% include "path" %}, {% import "path" %}, and {% extends "path" %}
Handles both double- and single-quoted paths
Resolves paths relative to the source file’s directory and the project root

This ensures that when an included template changes, any template that includes it is rebuilt.

Configuration

[analyzer.tera]
# currently no tunables

Processors

RSConstruct uses processors to discover and build products. Each processor scans for source files matching its conventions and produces output files.

Processor Types

There are four processor types: checker, generator, creator, and explicit. They differ in how inputs are discovered, how outputs are declared, and how results are cached.

See Processor Types for full descriptions, examples, and a comparison table.

Configuration

Declare processors by adding [processor.NAME] sections to rsconstruct.toml:

[processor.ruff]

[processor.pylint]
args = ["--disable=C0114"]

[processor.cc_single_file]

Only declared processors run — no processors are enabled by default. Use rsconstruct smart auto to auto-detect and add relevant processors.

Use rsconstruct processors list to see declared processors and descriptions. Use rsconstruct processors list --all to show all built-in processors, not just those enabled in the project. Use rsconstruct processors files to see which files each processor discovers.

Available Processors

Tera — renders Tera templates into output files
Ruff — lints Python files with ruff
Pylint — lints Python files with pylint
Mypy — type-checks Python files with mypy
Pyrefly — type-checks Python files with pyrefly
CC — builds full C/C++ projects from cc.yaml manifests
CC Single File — compiles C/C++ source files into executables (single-file)
Linux Module — builds Linux kernel modules from linux-module.yaml manifests
Cppcheck — runs static analysis on C/C++ source files
Clang-Tidy — runs clang-tidy static analysis on C/C++ source files
Shellcheck — lints shell scripts using shellcheck
Zspell — checks documentation files for spelling errors
Rumdl — lints Markdown files with rumdl
Make — runs make in directories containing Makefiles
Cargo — builds Rust projects using Cargo
Yamllint — lints YAML files with yamllint
Jq — validates JSON files with jq
Jsonlint — lints JSON files with jsonlint
Taplo — checks TOML files with taplo
Terms — checks that technical terms are backtick-quoted in Markdown files
Json Schema — validates JSON schema propertyOrdering
Iyamlschema — validates YAML files against JSON schemas (native)
Yaml2json — converts YAML files to JSON (native)
Markdown2html — converts Markdown to HTML using markdown CLI
Imarkdown2html — converts Markdown to HTML (native)

Output Directory Caching

Creator processors (cargo, sphinx, mdbook, pip, npm, gem, and user-defined creators) produce output in directories rather than individual files. RSConstruct caches these entire directories so that after rsconstruct clean && rsconstruct build, the output is restored from cache instead of being regenerated.

After a successful build, RSConstruct walks the output directories, stores every file as a content-addressed blob, and records a tree (manifest of paths, checksums, and Unix permissions). On restore, the entire directory tree is recreated from cached blobs with permissions preserved. See Cache System for details.

For user-defined creators, output directories are declared via output_dirs:

[processor.creator.venv]
command = "pip"
args = ["install", "-r", "requirements.txt"]
src_extensions = ["requirements.txt"]
output_dirs = [".venv"]

For built-in creators, this is controlled by the cache_output_dir config option (default true):

[processor.cargo]
cache_output_dir = false   # Disable for large target/ directories

Custom Processors

You can define custom processors in Lua. See Lua Plugins for details.

Processor Types

Every processor in RSConstruct has a type that determines how it discovers inputs, produces outputs, and interacts with the cache. There are four types.

Run rsconstruct processors types to list them.

Checker

A checker validates input files without producing any output. If the check passes, the result is cached — if the inputs haven’t changed on the next build, the check is skipped entirely.

How it works

Scans for files matching src_extensions in src_dirs
Creates one product per input file
Runs the tool on each file (or batch of files)
If the tool exits successfully, records a marker in the cache
On the next build, if inputs are unchanged, the check is skipped

What gets cached

A marker entry — no files, no blobs. The marker’s presence means “this check passed with these inputs.”

Examples

Lint Python files with ruff:

[processor.ruff]

Scans for .py files, runs ruff check on each. No output files produced.

src/main.py → (checker)
src/utils.py → (checker)

Lint shell scripts:

[processor.shellcheck]

Scans for .sh and .bash files, runs shellcheck on each.

Validate YAML files:

[processor.yamllint]

Scans for .yml and .yaml files, runs yamllint on each.

Validate JSON files with jq:

[processor.jq]

Scans for .json files, validates each with jq.

Spell check Markdown files:

[processor.zspell]

Scans for .md files, checks spelling with the built-in zspell engine.

Built-in checkers

ruff, pylint, mypy, pyrefly, black, pytest, doctest, shellcheck, luacheck, yamllint, jq, jsonlint, taplo, cppcheck, clang_tidy, cpplint, checkpatch, mdl, markdownlint, rumdl, aspell, zspell, ascii, encoding, duplicate_files, terms, eslint, jshint, standard, htmlhint, htmllint, tidy, stylelint, jslint, svglint, svgo, perlcritic, xmllint, checkstyle, php_lint, yq, hadolint, slidev, json_schema, iyamlschema, ijq, ijsonlint, iyamllint, itaplo, marp_images, license_header

Generator

A generator transforms each input file into one or more output files. It creates one product per input file (or one per input x format pair for multi-format generators like pandoc).

How it works

Scans for files matching src_extensions in src_dirs
For each input file, computes the output path from the input path, output directory, and format
Creates one product per input x format pair
Runs the tool to produce the output file
Stores the output as a content-addressed blob in the cache

What gets cached

One blob per output file. The blob is the raw file content, stored by its SHA-256 hash. On restore, the blob is hardlinked (or copied) to the output path.

Examples

Render Tera templates:

[processor.tera]

Scans tera.templates/ for .tera files, renders each template. The output path is the template path with the .tera extension stripped:

tera.templates/config.py.tera → config.py
tera.templates/README.md.tera → README.md

Convert Marp slides to PDF:

[processor.marp]

Scans marp/ for .md files, converts each to PDF (and optionally other formats):

marp/slides.md → out/marp/slides.pdf
marp/intro.md → out/marp/intro.pdf

Convert documents with pandoc (multi-format):

[processor.pandoc]

Scans pandoc/ for .md files, converts each to PDF, HTML, and DOCX. Each format is a separate product with its own cache entry:

pandoc/syllabus.md → out/pandoc/syllabus.pdf
pandoc/syllabus.md → out/pandoc/syllabus.html
pandoc/syllabus.md → out/pandoc/syllabus.docx

Compile single-file C programs:

[processor.cc_single_file]

Scans src/ for .c and .cc files, compiles each into an executable:

src/main.c → out/cc_single_file/src/main.elf
src/test.c → out/cc_single_file/src/test.elf

Convert Mermaid diagrams:

[processor.mermaid]

Scans for .mmd files, converts each to PNG (configurable formats):

diagrams/flow.mmd → out/mermaid/diagrams/flow.png

Compile SCSS to CSS:

[processor.sass]

Scans sass/ for .scss and .sass files, compiles each to CSS:

sass/styles.scss → out/sass/styles.css

Built-in generators

tera, mako, jinja2, cc_single_file, pandoc, marp, mermaid, drawio, chromium, libreoffice, protobuf, sass, markdown2html, pdflatex, a2x, objdump, rust_single_file, tags, pdfunite, ipdfunite, imarkdown2html, isass, yaml2json, generator, script

Creator

A creator runs a command and caches declared output files and directories. It scans for anchor files — files whose presence means “run this tool here.” One product is created per anchor file found, and the command runs in the anchor file’s directory.

Unlike generators (where outputs are derived from input paths), creator outputs are declared explicitly in the config via output_dirs and output_files.

How it works

Scans for anchor files matching src_extensions in src_dirs
Creates one product per anchor file
Runs the command in the anchor file’s directory
Walks all declared output_dirs and collects output_files
Stores each file as a content-addressed blob
Records a tree in the cache — a manifest listing every output file with its path, blob checksum, and Unix permissions

What gets cached

A tree entry listing all output files. On restore, the directory tree is recreated from cached blobs with permissions preserved. Individual files within the tree that already exist with the correct checksum are skipped.

Examples

Install Python dependencies with pip:

[processor.creator.venv]
command = "pip"
args = ["install", "-r", "requirements.txt"]
src_extensions = ["requirements.txt"]
output_dirs = [".venv"]

Scans for requirements.txt files. For each one, runs pip install and caches the entire .venv/ directory. After rsconstruct clean, the venv is restored from cache instead of reinstalling.

Build a Node.js project:

[processor.creator.npm_build]
command = "npm"
args = ["run", "build"]
src_extensions = ["package.json"]
output_dirs = ["dist"]

Scans for package.json files, runs npm run build, caches the dist/ directory.

Build documentation with Sphinx:

[processor.sphinx]

Scans for conf.py files, runs sphinx-build, caches the output directory.

docs/conf.py → (creator)

Build a Rust project with Cargo:

[processor.cargo]

Scans for Cargo.toml files, runs cargo build, optionally caches the target/ directory.

Cargo.toml → (creator)

Run a custom build script:

[processor.creator.assets]
command = "./build_assets.sh"
src_extensions = [".manifest"]
src_dirs = ["."]
output_dirs = ["assets/compiled", "assets/sprites"]
output_files = ["assets/manifest.json"]

Scans for .manifest files, runs the build script, caches two output directories and one output file.

Built-in creators

cargo, pip, npm, gem, sphinx, mdbook, jekyll, cc (full C/C++ projects)

User-defined creators use the creator processor type directly via [processor.creator.NAME].

Explicit

An explicit processor aggregates many inputs into (possibly) many output files and/or directories. Unlike other types which create one product per discovered file, explicit creates a single product with all declared inputs and outputs.

How it works

Inputs are listed explicitly via inputs and input_globs in the config
Creates a single product with all inputs and all outputs
Runs the command, passing --inputs and --outputs on the command line
Stores each output file as a content-addressed blob

What gets cached

One blob per output file (like generator).

Examples

Build a static site from generated HTML:

[processor.explicit.site]
command = "python3"
args = ["build_site.py"]
input_globs = ["out/pandoc/*.html", "templates/*.html"]
inputs = ["site.yaml"]
outputs = ["out/site/index.html", "out/site/style.css"]

Waits for pandoc to produce HTML files, then combines them with templates into a site. All inputs are aggregated into one product:

out/pandoc/page1.html, out/pandoc/page2.html, templates/base.html, site.yaml → out/site/index.html, out/site/style.css

Merge PDFs into a course bundle:

[processor.explicit.course]
command = "pdfunite"
input_globs = ["out/pdflatex/*.pdf"]
outputs = ["out/course/full-course.pdf"]

Aggregates all PDF outputs from pdflatex into a single merged PDF.

Built-in explicit processors

explicit, pdfunite, ipdfunite

Comparison

	Checker	Generator	Creator	Explicit
Purpose	Validate	Transform	Build/install	Aggregate
Inputs	Scanned	Scanned	Scanned (anchor files)	Declared in config
Products	One per input	One per input (x format)	One per anchor	One total
Outputs	None	Derived from input path	Declared dirs + files	Declared files
Cache type	Marker	Blob	Tree	Blob
Runs in	Project root	Project root	Anchor file’s directory	Project root
Command args	Input files	Input + output	User-defined args	`--inputs` + `--outputs`

A2x Processor

Purpose

Converts AsciiDoc files to PDF (or other formats) using a2x.

How It Works

Discovers .txt (AsciiDoc) files in the project and runs a2x on each file, producing output in the configured format.

Source Files

Input: **/*.txt
Output: out/a2x/{relative_path}.pdf

Configuration

[processor.a2x]
a2x = "a2x"                           # The a2x command to run
format = "pdf"                         # Output format (pdf, xhtml, dvi, ps, epub, mobi)
args = []                              # Additional arguments to pass to a2x
output_dir = "out/a2x"                # Output directory
dep_inputs = []                      # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`a2x`	string	`"a2x"`	The a2x executable to run
`format`	string	`"pdf"`	Output format
`args`	string[]	`[]`	Extra arguments passed to a2x
`output_dir`	string	`"out/a2x"`	Output directory
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

Ascii Check Processor

Purpose

Validates that files contain only ASCII characters.

How It Works

Discovers .md files in the project and checks each for non-ASCII characters. Files containing non-ASCII bytes fail the check. This is a built-in processor that does not require any external tools.

This processor supports batch mode, allowing multiple files to be checked in a single invocation.

Source Files

Input: **/*.md
Output: none (checker)

Configuration

[processor.ascii]
args = []                              # Additional arguments (unused, for consistency)
dep_inputs = []                      # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments (reserved for future use)
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Aspell Processor

Purpose

Checks spelling in Markdown files using aspell.

How It Works

Discovers .md files in the project and runs aspell on each file using the configured aspell configuration file. A non-zero exit code fails the product.

Source Files

Input: **/*.md
Output: none (checker)

Configuration

[processor.aspell]
command = "aspell"                     # The aspell command to run
conf = ".aspell.conf"                  # Aspell configuration file
args = []                              # Additional arguments to pass to aspell
dep_inputs = []                      # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`command`	string	`"aspell"`	The aspell executable to run
`conf`	string	`".aspell.conf"`	Aspell configuration file
`args`	string[]	`[]`	Extra arguments passed to aspell
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Black Processor

Purpose

Checks Python file formatting using Black, the uncompromising code formatter. Runs black --check which verifies files are already formatted without modifying them.

How It Works

Python files matching configured extensions are checked via black --check. The command exits with a non-zero status if any file would be reformatted, causing the build to fail.

Source Files

Input: **/*{src_extensions} (default: *.py)

Configuration

[processor.black]
src_extensions = [".py"]                      # File extensions to check (default: [".py"])
dep_inputs = []                         # Additional files that trigger rechecks when changed
args = []                                 # Extra arguments passed to black

Key	Type	Default	Description
`src_extensions`	string[]	`[".py"]`	File extensions to discover
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rechecks
`dep_auto`	string[]	`["pyproject.toml"]`	Config files that auto-trigger rechecks
`args`	string[]	`[]`	Additional arguments passed to `black`

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Cargo Processor

Purpose

Builds Rust projects using Cargo. Each Cargo.toml produces a cached success marker, allowing RSConstruct to skip rebuilds when source files haven’t changed.

How It Works

Discovers files named Cargo.toml in the project. For each Cargo.toml found, the processor runs cargo build (or a configured command) in that directory.

Input Tracking

The cargo processor tracks all .rs and .toml files in the Cargo.toml’s directory tree as inputs. This includes:

Cargo.toml and Cargo.lock
All Rust source files (src/**/*.rs)
Test files, examples, benches
Workspace member Cargo.toml files

When any tracked file changes, rsconstruct will re-run cargo.

Workspaces

For Cargo workspaces, each Cargo.toml (root and members) is discovered as a separate product. To build only the workspace root, use src_exclude_paths to skip member directories, or configure src_dirs to limit discovery.

Source Files

Input: Cargo.toml plus all .rs and .toml files in the project tree
Output: None (creator — produces output in target directory)

Configuration

[processor.cargo]
cargo = "cargo"          # Cargo binary to use
command = "build"        # Cargo command (build, check, test, clippy, etc.)
args = []                # Extra arguments passed to cargo
profiles = ["dev", "release"]  # Cargo profiles to build
src_dirs = [""]            # Directory to scan ("" = project root)
src_extensions = ["Cargo.toml"]
dep_inputs = []        # Additional files that trigger rebuilds
cache_output_dir = true  # Cache the target/ directory for fast restore after clean

Key	Type	Default	Description
`cargo`	string	`"cargo"`	Path or name of the cargo binary
`command`	string	`"build"`	Cargo subcommand to run
`args`	string[]	`[]`	Extra arguments passed to cargo
`profiles`	string[]	`["dev", "release"]`	Cargo profiles to build (creates one product per profile)
`src_dirs`	string[]	`[""]`	Directory to scan for Cargo.toml files
`src_extensions`	string[]	`["Cargo.toml"]`	File names to match
`src_exclude_dirs`	string[]	`["/.git/", "/target/", ...]`	Directory patterns to exclude
`src_exclude_paths`	string[]	`[]`	Paths (relative to project root) to exclude
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds
`cache_output_dir`	boolean	`true`	Cache the `target/` directory so `rsconstruct clean && rsconstruct build` restores from cache. Consider disabling for large projects.

Batch support

Runs as a single whole-project operation (e.g., cargo build, npm install).

Examples

Basic Usage

[processor.cargo]

Release Only

[processor.cargo]
profiles = ["release"]

Dev Only

[processor.cargo]
profiles = ["dev"]

Use cargo check Instead of build

[processor.cargo]
command = "check"

Run clippy

[processor.cargo]
command = "clippy"
args = ["--", "-D", "warnings"]

Workspace Root Only

[processor.cargo]
src_exclude_paths = ["crates/"]

Notes

Cargo has its own incremental compilation, so rsconstruct’s caching mainly avoids invoking cargo at all when nothing changed
The target/ directory is automatically excluded from input scanning
For monorepos with multiple Rust projects, each Cargo.toml is built separately

CC Project Processor

Purpose

Builds full C/C++ projects with multiple targets (libraries and executables) defined in a cc.yaml manifest file. Unlike the CC Single File processor which compiles each source file into a standalone executable, this processor supports multi-file targets with dependency linking.

How It Works

The processor scans for cc.yaml files. Each manifest defines libraries and programs to build. All paths in the manifest (sources, include directories) are relative to the cc.yaml file’s location and are automatically resolved to project-root-relative paths before compilation. All commands run from the project root.

Output goes under out/cc/<path-to-cc.yaml-dir>/, so a manifest at src/exercises/foo/cc.yaml produces output in out/cc/src/exercises/foo/. A manifest at the project root produces output in out/cc/.

Source files are compiled to object files, then linked into the final targets:

src/exercises/foo/cc.yaml defines:
  library "mymath" (static) from math.c, utils.c
  program "main" from main.c, links mymath

Build produces:
  out/cc/src/exercises/foo/obj/mymath/math.o
  out/cc/src/exercises/foo/obj/mymath/utils.o
  out/cc/src/exercises/foo/lib/libmymath.a
  out/cc/src/exercises/foo/obj/main/main.o
  out/cc/src/exercises/foo/bin/main

cc.yaml Format

All paths in the manifest are relative to the cc.yaml file’s location.

# Global settings (all optional)
cc: gcc               # C compiler (default: gcc)
cxx: g++              # C++ compiler (default: g++)
cflags: [-Wall]       # Global C flags
cxxflags: [-Wall]     # Global C++ flags
ldflags: []           # Global linker flags
include_dirs: [include]  # Global -I paths (relative to cc.yaml location)

# Library definitions
libraries:
  - name: mymath
    lib_type: shared   # shared (.so) | static (.a) | both
    sources: [src/math.c, src/utils.c]
    include_dirs: [include]  # Additional -I for this library
    cflags: []               # Additional C flags
    cxxflags: []             # Additional C++ flags
    ldflags: [-lm]           # Linker flags for shared lib

  - name: myhelper
    lib_type: static
    sources: [src/helper.c]

# Program definitions
programs:
  - name: main
    sources: [src/main.c]
    link: [mymath, myhelper]  # Libraries defined above to link against
    ldflags: [-lpthread]      # Additional linker flags

  - name: tool
    sources: [src/tool.cc]    # .cc -> uses C++ compiler
    link: [mymath]

Library Types

Type	Output	Description
`shared`	`lib/lib<name>.so`	Shared library (default). Sources compiled with `-fPIC`.
`static`	`lib/lib<name>.a`	Static library via `ar rcs`.
`both`	Both `.so` and `.a`	Builds both shared and static variants.

Language Detection

The compiler is chosen per source file based on extension:

Extensions	Compiler
`.c`	C compiler (`cc` field)
`.cc`, `.cpp`, `.cxx`, `.C`	C++ compiler (`cxx` field)

Global cflags are used for C files and cxxflags for C++ files.

Output Layout

Output is placed under out/cc/<cc.yaml-relative-dir>/:

out/cc/<cc.yaml-dir>/
  obj/<target_name>/    # Object files per target
    file.o
  lib/                  # Libraries
    lib<name>.a
    lib<name>.so
  bin/                  # Executables
    <program_name>

Build Modes

Compile + Link (default)

Each source is compiled to a .o file, then targets are linked from objects. This provides incremental rebuilds — only changed sources are recompiled.

Single Invocation

When single_invocation = true in rsconstruct.toml, programs are built by passing all sources directly to the compiler in one command. Libraries still use compile+link since ar requires object files.

Configuration

[processor.cc]
enabled = true            # Enable/disable (default: true)
cc = "gcc"                # Default C compiler (default: "gcc")
cxx = "g++"               # Default C++ compiler (default: "g++")
cflags = []               # Additional global C flags
cxxflags = []             # Additional global C++ flags
ldflags = []              # Additional global linker flags
include_dirs = []         # Additional global -I paths
single_invocation = false # Use single-invocation mode (default: false)
dep_inputs = []         # Extra files that trigger rebuilds
cache_output_dir = true   # Cache entire output directory (default: true)

Note: The cc.yaml manifest settings override the rsconstruct.toml defaults for compiler and flags.

Configuration Reference

Key	Type	Default	Description
`enabled`	bool	`true`	Enable/disable the processor
`cc`	string	`"gcc"`	Default C compiler
`cxx`	string	`"g++"`	Default C++ compiler
`cflags`	string[]	`[]`	Global C compiler flags
`cxxflags`	string[]	`[]`	Global C++ compiler flags
`ldflags`	string[]	`[]`	Global linker flags
`include_dirs`	string[]	`[]`	Global include directories
`single_invocation`	bool	`false`	Build programs in single compiler invocation
`dep_inputs`	string[]	`[]`	Extra files that trigger rebuilds when changed
`cache_output_dir`	bool	`true`	Cache the entire output directory
`src_dirs`	string[]	`[""]`	Directory to scan for cc.yaml files
`src_extensions`	string[]	`["cc.yaml"]`	File patterns to scan for

Batch support

Runs as a single whole-project operation (e.g., cargo build, npm install).

Example

Given this project layout:

myproject/
  rsconstruct.toml
  exercises/
    math/
      cc.yaml
      include/
        math.h
      math.c
      main.c

With exercises/math/cc.yaml:

include_dirs: [include]

libraries:
  - name: math
    lib_type: static
    sources: [math.c]

programs:
  - name: main
    sources: [main.c]
    link: [math]

Running rsconstruct build produces:

out/cc/exercises/math/obj/math/math.o
out/cc/exercises/math/lib/libmath.a
out/cc/exercises/math/obj/main/main.o
out/cc/exercises/math/bin/main

CC Single File Processor

Purpose

Compiles C (.c) and C++ (.cc) source files into executables, one source file per executable.

How It Works

Source files under the configured source directory are compiled into executables under out/cc_single_file/, mirroring the directory structure:

src/main.c       →  out/cc_single_file/main.elf
src/a/b.c        →  out/cc_single_file/a/b.elf
src/app.cc       →  out/cc_single_file/app.elf

Header dependencies are automatically tracked via compiler-generated .d files (-MMD -MF). When a header changes, all source files that include it are rebuilt.

Source Files

Input: {source_dir}/**/*.c, {source_dir}/**/*.cc
Output: out/cc_single_file/{relative_path}{output_suffix}

Per-File Flags

Per-file compile and link flags can be set via special comments in source files. This allows individual files to require specific libraries or compiler options without affecting the entire project.

Flag directives

// EXTRA_COMPILE_FLAGS_BEFORE=-pthread
// EXTRA_COMPILE_FLAGS_AFTER=-O2 -DNDEBUG
// EXTRA_LINK_FLAGS_BEFORE=-L/usr/local/lib
// EXTRA_LINK_FLAGS_AFTER=-lX11

Command directives

Execute a command and use its stdout as flags (no shell):

// EXTRA_COMPILE_CMD=pkg-config --cflags gtk+-3.0
// EXTRA_LINK_CMD=pkg-config --libs gtk+-3.0

Shell directives

Execute via sh -c (full shell syntax):

// EXTRA_COMPILE_SHELL=echo -DLEVEL2_CACHE_LINESIZE=$(getconf LEVEL2_CACHE_LINESIZE)
// EXTRA_LINK_SHELL=echo -L$(brew --prefix openssl)/lib

Backtick substitution

Flag directives also support backtick substitution for inline command execution:

// EXTRA_COMPILE_FLAGS_AFTER=`pkg-config --cflags gtk+-3.0`
// EXTRA_LINK_FLAGS_AFTER=`pkg-config --libs gtk+-3.0`

Command caching

All command and shell directives (EXTRA_*_CMD, EXTRA_*_SHELL, and backtick substitutions) are cached in memory during a build. If multiple source files use the same command (e.g., pkg-config --cflags gtk+-3.0), it is executed only once. This improves build performance when many files share common dependencies.

Compiler profile-specific flags

When using multiple compiler profiles, you can specify flags that only apply to a specific compiler by adding [profile_name] after the directive name:

// EXTRA_COMPILE_FLAGS_BEFORE=-g
// EXTRA_COMPILE_FLAGS_BEFORE[gcc]=-femit-struct-debug-baseonly
// EXTRA_COMPILE_FLAGS_BEFORE[clang]=-gline-tables-only

In this example:

-g is applied to all compilers
-femit-struct-debug-baseonly is only applied when compiling with the “gcc” profile
-gline-tables-only is only applied when compiling with the “clang” profile

The profile name matches the name field in your [[processor.cc_single_file.compilers]] configuration:

[[processor.cc_single_file.compilers]]
name = "gcc"      # Matches [gcc] suffix
cc = "gcc"

[[processor.cc_single_file.compilers]]
name = "clang"    # Matches [clang] suffix
cc = "clang"

This works with all directive types:

EXTRA_COMPILE_FLAGS_BEFORE[profile]
EXTRA_COMPILE_FLAGS_AFTER[profile]
EXTRA_LINK_FLAGS_BEFORE[profile]
EXTRA_LINK_FLAGS_AFTER[profile]
EXTRA_COMPILE_CMD[profile]
EXTRA_LINK_CMD[profile]
EXTRA_COMPILE_SHELL[profile]
EXTRA_LINK_SHELL[profile]

Excluding files from specific profiles

To exclude a source file from being compiled with specific compiler profiles, use EXCLUDE_PROFILE:

// EXCLUDE_PROFILE=clang

This is useful when a file uses compiler-specific features that aren’t available in other compilers. For example, a file using GCC-only builtins like __builtin_va_arg_pack_len():

// EXCLUDE_PROFILE=clang
// This file uses GCC-specific builtins
#include <stdarg.h>

void example(int first, ...) {
    int count = __builtin_va_arg_pack_len();  // GCC-only
    // ...
}

You can exclude multiple profiles by listing them space-separated:

// EXCLUDE_PROFILE=clang icc

Directive summary

Directive	Execution	Use case
`EXTRA_COMPILE_FLAGS_BEFORE`	Literal flags	Flags before default cflags
`EXTRA_COMPILE_FLAGS_AFTER`	Literal flags	Flags after default cflags
`EXTRA_LINK_FLAGS_BEFORE`	Literal flags	Flags before default ldflags
`EXTRA_LINK_FLAGS_AFTER`	Literal flags	Flags after default ldflags
`EXTRA_COMPILE_CMD`	Subprocess (no shell)	Dynamic compile flags via command
`EXTRA_LINK_CMD`	Subprocess (no shell)	Dynamic link flags via command
`EXTRA_COMPILE_SHELL`	`sh -c` (full shell)	Dynamic compile flags needing shell features
`EXTRA_LINK_SHELL`	`sh -c` (full shell)	Dynamic link flags needing shell features

Supported comment styles

Directives can appear in any of these comment styles:

C++ style:

// EXTRA_LINK_FLAGS_AFTER=-lX11

C block comment (single line):

/* EXTRA_LINK_FLAGS_AFTER=-lX11 */

C block comment (multi-line, star-prefixed):

/*
 * EXTRA_LINK_FLAGS_AFTER=-lX11
 */

Command Line Ordering

The compiler command is constructed in this order:

compiler -MMD -MF deps -I... [compile_before] [cflags/cxxflags] [compile_after] -o output source [link_before] [ldflags] [link_after]

Link flags come after the source file so the linker can resolve symbols correctly.

Position	Source
`compile_before`	`EXTRA_COMPILE_FLAGS_BEFORE` + `EXTRA_COMPILE_CMD` + `EXTRA_COMPILE_SHELL`
`cflags/cxxflags`	`[processor.cc_single_file]` config `cflags` or `cxxflags`
`compile_after`	`EXTRA_COMPILE_FLAGS_AFTER`
`link_before`	`EXTRA_LINK_FLAGS_BEFORE` + `EXTRA_LINK_CMD` + `EXTRA_LINK_SHELL`
`ldflags`	`[processor.cc_single_file]` config `ldflags`
`link_after`	`EXTRA_LINK_FLAGS_AFTER`

Verbosity Levels (`--processor-verbose N`)

Level	Output
0 (default)	Target basename: `main.elf`
1	Target path + compiler commands: `out/cc_single_file/main.elf`
2	Adds source path: `out/cc_single_file/main.elf <- src/main.c`
3	Adds all inputs: `out/cc_single_file/main.elf <- src/main.c, src/utils.h`

Configuration

Single Compiler (Legacy)

[processor.cc_single_file]
cc = "gcc"                # C compiler (default: "gcc")
cxx = "g++"               # C++ compiler (default: "g++")
cflags = []               # C compiler flags
cxxflags = []             # C++ compiler flags
ldflags = []              # Linker flags
include_paths = []        # Additional -I paths (relative to project root)
src_dirs = ["src"]          # Source directory (default: "src")
output_suffix = ".elf"    # Suffix for output executables (default: ".elf")
dep_inputs = []         # Additional files that trigger rebuilds when changed
include_scanner = "native" # Method for scanning header dependencies (default: "native")

Multiple Compilers

To compile with multiple compilers (e.g., both GCC and Clang), use the compilers array:

[processor.cc_single_file]
src_dirs = ["src"]
include_paths = ["include"]  # Shared across all compilers

[[processor.cc_single_file.compilers]]
name = "gcc"
cc = "gcc"
cxx = "g++"
cflags = ["-Wall", "-Wextra"]
cxxflags = ["-Wall", "-Wextra"]
ldflags = []
output_suffix = ".elf"

[[processor.cc_single_file.compilers]]
name = "clang"
cc = "clang"
cxx = "clang++"
cflags = ["-Wall", "-Wextra", "-Weverything"]
cxxflags = ["-Wall", "-Wextra"]
ldflags = []
output_suffix = ".elf"

When using multiple compilers, outputs are organized by compiler name:

src/main.c  →  out/cc_single_file/gcc/main.elf
            →  out/cc_single_file/clang/main.elf

Each source file is compiled once per compiler profile, allowing you to:

Test code with multiple compilers to catch different warnings
Compare output between compilers
Build for different targets (cross-compilation)

Configuration Reference

Key	Type	Default	Description
`cc`	string	`"gcc"`	C compiler command
`cxx`	string	`"g++"`	C++ compiler command
`cflags`	string[]	`[]`	Flags passed to the C compiler
`cxxflags`	string[]	`[]`	Flags passed to the C++ compiler
`ldflags`	string[]	`[]`	Flags passed to the linker
`include_paths`	string[]	`[]`	Additional `-I` include paths (shared)
`src_dirs`	string[]	`["src"]`	Directory to scan for source files
`output_suffix`	string	`".elf"`	Suffix appended to output executables
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds
`include_scanner`	string	`"native"`	Method for scanning header dependencies
`compilers`	array	`[]`	Multiple compiler profiles (overrides single-compiler fields)

Compiler Profile Fields

Each entry in the compilers array can have:

Key	Type	Required	Description
`name`	string	Yes	Profile name (used in output path)
`cc`	string	No	C compiler (default: “gcc”)
`cxx`	string	No	C++ compiler (default: “g++”)
`cflags`	string[]	No	C compiler flags
`cxxflags`	string[]	No	C++ compiler flags
`ldflags`	string[]	No	Linker flags
`output_suffix`	string	No	Output suffix (default: “.elf”)

Batch support

Each input file is processed individually, producing its own output file.

Include Scanner

The include_scanner option controls how header dependencies are discovered:

Value	Description
`native`	Fast regex-based scanner (default). Parses `#include` directives directly without spawning external processes. Handles `#include "file"` and `#include <file>` forms.
`compiler`	Uses `gcc -MM` / `g++ -MM` to scan dependencies. More accurate for complex cases (computed includes, conditional compilation) but slower as it spawns a compiler process per source file.

Native scanner behavior

The native scanner:

Recursively follows #include directives
Searches include paths in order: source file directory, configured include_paths, project root
Skips system headers (/usr/..., /lib/...)
Only tracks project-local headers (relative paths)

When to use compiler scanner

Use include_scanner = "compiler" if you have:

Computed includes: #include MACRO_THAT_EXPANDS_TO_FILENAME
Complex conditional compilation affecting which headers are included
Headers outside the standard search paths that the native scanner misses

The native scanner may occasionally report extra dependencies (false positives), which is safe—it just means some files might rebuild unnecessarily. It will not miss dependencies (false negatives) for standard #include patterns.

Checkpatch Processor

Purpose

Checks C source files using the Linux kernel’s checkpatch.pl script.

How It Works

Discovers .c and .h files under src/ (excluding common C/C++ build directories), runs checkpatch.pl on each file, and records success in the cache. A non-zero exit code from checkpatch fails the product.

This processor supports batch mode.

Source Files

Input: src/**/*.c, src/**/*.h
Output: none (checker)

Configuration

[processor.checkpatch]
args = []
dep_inputs = []

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to checkpatch.pl
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool processes one file at a time. Each file is checked in a separate invocation.

Checkstyle Processor

Purpose

Checks Java code style using Checkstyle.

How It Works

Discovers .java files in the project (excluding common build tool directories), runs checkstyle on each file, and records success in the cache. A non-zero exit code from checkstyle fails the product.

This processor supports batch mode.

If a checkstyle.xml file exists, it is automatically added as an extra input so that configuration changes trigger rebuilds.

Source Files

Input: **/*.java
Output: none (checker)

Configuration

[processor.checkstyle]
args = []
dep_inputs = []

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to checkstyle
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Chromium Processor

Purpose

Converts HTML files to PDF using headless Chromium (Google Chrome).

How It Works

Discovers .html files in the configured scan directory (default: out/marp) and runs headless Chromium with --print-to-pdf on each file, producing a PDF output.

This is typically used as a post-processing step after another processor (e.g., Marp) generates HTML files.

Source Files

Input: out/marp/**/*.html (default scan directory)
Output: out/chromium/{relative_path}.pdf

Configuration

[processor.chromium]
chromium_bin = "google-chrome"            # The Chromium/Chrome executable to run
args = []                                 # Additional arguments to pass to Chromium
output_dir = "out/chromium"               # Output directory for PDFs
dep_inputs = []                         # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`chromium_bin`	string	`"google-chrome"`	The Chromium or Google Chrome executable
`args`	string[]	`[]`	Extra arguments passed to Chromium
`output_dir`	string	`"out/chromium"`	Base output directory for PDF files
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

Clang-Tidy Processor

Purpose

Runs clang-tidy static analysis on C/C++ source files.

How It Works

Discovers .c and .cc files under the configured source directory, runs clang-tidy on each file individually, and creates a stub file on success. A non-zero exit code from clang-tidy fails the product.

Note: This processor does not support batch mode. Each file is checked separately to avoid cross-file analysis issues with unrelated files.

Source Files

Input: {source_dir}/**/*.c, {source_dir}/**/*.cc
Output: out/clang_tidy/{flat_name}.clang_tidy

Configuration

[processor.clang_tidy]
args = ["-checks=*"]                        # Arguments passed to clang-tidy
compiler_args = ["-std=c++17"]              # Arguments passed after -- to the compiler
dep_inputs = [".clang-tidy"]              # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`args`	string[]	`[]`	Arguments passed to clang-tidy
`compiler_args`	string[]	`[]`	Compiler arguments passed after `--` separator
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool processes one file at a time. Each file is checked in a separate invocation.

Compiler Arguments

Clang-tidy requires knowing compiler flags to properly parse the source files. Use compiler_args to specify include paths, defines, and language standards:

[processor.clang_tidy]
compiler_args = ["-std=c++17", "-I/usr/include/mylib", "-DDEBUG"]

Using .clang-tidy File

Clang-tidy automatically reads configuration from a .clang-tidy file in the project root. Add it to dep_inputs so changes trigger rebuilds:

[processor.clang_tidy]
dep_inputs = [".clang-tidy"]

Clippy Processor

Purpose

Lints Rust projects using Cargo Clippy. Each Cargo.toml produces a cached success marker, allowing RSConstruct to skip re-linting when source files haven’t changed.

How It Works

Discovers files named Cargo.toml in the project. For each Cargo.toml found, the processor runs cargo clippy in that directory. A non-zero exit code fails the product.

Input Tracking

The clippy processor tracks all .rs and .toml files in the Cargo.toml’s directory tree as inputs. This includes:

Cargo.toml and Cargo.lock
All Rust source files (src/**/*.rs)
Test files, examples, benches
Workspace member Cargo.toml files

When any tracked file changes, rsconstruct will re-run clippy.

Source Files

Input: Cargo.toml plus all .rs and .toml files in the project tree
Output: None (checker-style caching)

Configuration

[processor.clippy]
cargo = "cargo"          # Cargo binary to use
command = "clippy"       # Cargo command (usually "clippy")
args = []                # Extra arguments passed to cargo clippy
src_dirs = [""]            # Directory to scan ("" = project root)
src_extensions = ["Cargo.toml"]
dep_inputs = []        # Additional files that trigger rebuilds

Key	Type	Default	Description
`cargo`	string	`"cargo"`	Path or name of the cargo binary
`command`	string	`"clippy"`	Cargo subcommand to run
`args`	string[]	`[]`	Extra arguments passed to cargo clippy
`src_dirs`	string[]	`[""]`	Directory to scan for Cargo.toml files
`src_extensions`	string[]	`["Cargo.toml"]`	File names to match
`src_exclude_dirs`	string[]	`["/.git/", "/target/", ...]`	Directory patterns to exclude
`src_exclude_paths`	string[]	`[]`	Paths (relative to project root) to exclude
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool processes one file at a time. Each file is checked in a separate invocation.

Examples

Basic Usage

[processor.clippy]

Deny All Warnings

[processor.clippy]
args = ["--", "-D", "warnings"]

Use Both Cargo Build and Clippy

[processor.cargo]

[processor.clippy]

Notes

Clippy uses the cargo binary which is shared with the cargo processor
The target/ directory is automatically excluded from input scanning
For monorepos with multiple Rust projects, each Cargo.toml is linted separately

CMake Processor

Purpose

Lints CMake files using cmake --lint.

How It Works

Discovers CMakeLists.txt files in the project (excluding common build tool directories), runs cmake --lint on each file, and records success in the cache. A non-zero exit code from cmake fails the product.

This processor supports batch mode.

Source Files

Input: **/CMakeLists.txt
Output: none (checker)

Configuration

[processor.cmake]
args = []
dep_inputs = []

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to cmake
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Cppcheck Processor

Purpose

Runs cppcheck static analysis on C/C++ source files.

How It Works

Discovers .c and .cc files under the configured source directory, runs cppcheck on each file individually, and creates a stub file on success. A non-zero exit code from cppcheck fails the product.

Note: This processor does not support batch mode. Each file is checked separately because cppcheck performs cross-file analysis (CTU - Cross Translation Unit) which produces false positives when unrelated files are checked together. For example, standalone example programs that define classes with the same name will trigger ctuOneDefinitionRuleViolation errors even though the files are never linked together. Cppcheck has no flag to disable this cross-file analysis (--max-ctu-depth=0 does not help), so files must be checked individually.

Source Files

Input: {source_dir}/**/*.c, {source_dir}/**/*.cc
Output: out/cppcheck/{flat_name}.cppcheck

Configuration

[processor.cppcheck]
args = ["--error-exitcode=1", "--enable=warning,style,performance,portability"]
dep_inputs = [".cppcheck-suppressions"]   # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`args`	string[]	`["--error-exitcode=1", "--enable=warning,style,performance,portability"]`	Arguments passed to cppcheck
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

To use a suppressions file, add "--suppressions-list=.cppcheck-suppressions" to args.

Batch support

The tool processes one file at a time. Each file is checked in a separate invocation.

Cpplint Processor

Purpose

Lints C/C++ files using cpplint (Google C++ style checker).

How It Works

Discovers .c, .cc, .h, and .hh files under src/ (excluding common C/C++ build directories), runs cpplint on each file, and records success in the cache. A non-zero exit code from cpplint fails the product.

This processor supports batch mode.

Source Files

Input: src/**/*.c, src/**/*.cc, src/**/*.h, src/**/*.hh
Output: none (checker)

Configuration

[processor.cpplint]
args = []
dep_inputs = []

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to cpplint
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool processes one file at a time. Each file is checked in a separate invocation.

Doctest Processor

Purpose

Runs Python doctests embedded in .py files using python3 -m doctest.

How It Works

Python files (.py) are checked for embedded doctests. Each file is run through python3 -m doctest — failing doctests cause the build to fail.

Source Files

Input: **/*.py
Output: none (checker — pass/fail only)

Configuration

[processor.doctest]
src_extensions = [".py"]                      # File extensions to process (default: [".py"])
dep_inputs = []                         # Additional files that trigger rebuilds

Key	Type	Default	Description
`src_extensions`	string[]	`[".py"]`	File extensions to discover
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Drawio Processor

Purpose

Converts Draw.io diagram files to PNG, SVG, or PDF.

How It Works

Discovers .drawio files in the project and runs drawio in export mode on each file, generating output in the configured formats.

Source Files

Input: **/*.drawio
Output: out/drawio/{format}/{relative_path}.{format}

Configuration

[processor.drawio]
drawio_bin = "drawio"                  # The drawio command to run
formats = ["png"]                      # Output formats (png, svg, pdf)
args = []                              # Additional arguments to pass to drawio
output_dir = "out/drawio"              # Output directory
dep_inputs = []                      # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`drawio_bin`	string	`"drawio"`	The drawio executable to run
`formats`	string[]	`["png"]`	Output formats to generate (`png`, `svg`, `pdf`)
`args`	string[]	`[]`	Extra arguments passed to drawio
`output_dir`	string	`"out/drawio"`	Base output directory
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

ESLint Processor

Purpose

Lints JavaScript and TypeScript files using ESLint.

How It Works

Discovers .js, .jsx, .ts, .tsx, .mjs, and .cjs files in the project (excluding common build tool directories), runs eslint on each file, and records success in the cache. A non-zero exit code from eslint fails the product.

This processor supports batch mode, allowing multiple files to be checked in a single eslint invocation for better performance.

If an ESLint config file exists (.eslintrc* or eslint.config.*), it is automatically added as an extra input so that configuration changes trigger rebuilds.

Source Files

Input: **/*.js, **/*.jsx, **/*.ts, **/*.tsx, **/*.mjs, **/*.cjs
Output: none (checker)

Configuration

[processor.eslint]
command = "eslint"
args = []
dep_inputs = []

Key	Type	Default	Description
`command`	string	`"eslint"`	The eslint executable to run
`args`	string[]	`[]`	Extra arguments passed to eslint
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Explicit Processor

Why “explicit”?

Other processor types discover their inputs by scanning directories for files matching certain extensions. The explicit processor is different: the user declares exactly which files are inputs and which are outputs. Nothing is discovered or inferred.

Names considered:

explicit — chosen. Directly communicates the key difference: everything is declared rather than discovered.
custom — too generic. Doesn’t say what makes it different from the existing generator processor (which is also “custom”).
rule — precise (Bazel/Make terminology for a build rule with explicit inputs/outputs), but carries baggage from other build systems and doesn’t fit the rsconstruct naming convention (processors, not rules).
aggregate — describes the many-inputs-to-few-outputs pattern, but not all uses are aggregations.
task — too generic. Could mean anything.

Purpose

Runs a user-configured script or command with explicitly declared inputs and outputs. Unlike scan-based processors (which discover one product per source file), the explicit processor creates a single product with all declared inputs feeding into all declared outputs.

This is ideal for build steps that aggregate many files into one or a few outputs, such as:

Generating an index page from all HTML files in a directory
Building a bundle from multiple source files
Creating a report from multiple data files

How It Works

The processor resolves all inputs (literal paths) and input_globs (glob patterns) into a flat file list. It creates a single product with these files as inputs and the outputs list as outputs.

Rsconstruct uses this information for:

Rebuild detection: if any input changes, the product is rebuilt
Dependency ordering: if an input is an output of another processor, that processor runs first (automatic via resolve_dependencies())
Caching: outputs are cached and restored on cache hit

Invocation

The command is invoked as:

command [args...] --inputs <input1> <input2> ... --outputs <output1> <output2> ...

Input ordering

Inputs are passed in a deterministic order:

inputs entries first, in config file order
input_globs results second, one glob at a time in config file order, files within each glob sorted alphabetically

This ordering is stable across builds (assuming the same set of files exists).

Configuration

[processor.explicit.site]
command = "scripts/build_site.py"
args = ["--verbose"]
inputs = [
    "resources/index.html",
    "resources/index.css",
    "resources/index.js",
    "tags/level.txt",
    "tags/category.txt",
    "tags/audiences.txt",
]
input_globs = [
    "docs/courses/**/*.html",
    "docs/tracks/*.html",
]
outputs = [
    "docs/index.html",
]

Fields

Key	Type	Required	Description
`command`	string	yes	Script or binary to execute
`args`	array of strings	no	Extra arguments passed before `--inputs`
`inputs`	array of strings	no	Literal input file paths
`input_globs`	array of strings	no	Glob patterns resolved to input files
`outputs`	array of strings	yes	Output file paths produced by the command

At least one of inputs or input_globs must be specified.

Glob patterns

input_globs supports standard glob syntax:

* matches any sequence of characters within a path component
** matches any number of path components (recursive)
? matches a single character
[abc] matches one of the listed characters

Glob results that match no files are silently ignored (the set of matching files may grow as upstream generators produce outputs via the fixed-point discovery loop).

Cross-Processor Dependencies

The explicit processor works naturally with the fixed-point discovery loop. If input_globs matches files that are outputs of other processors (e.g., pandoc-generated HTML files), rsconstruct automatically:

Injects those declared outputs as virtual files during discovery
Resolves dependency edges so upstream processors run first
Rebuilds the explicit processor when upstream outputs change

This means you do not need to manually order processors or wait for a second build — everything is handled in a single build invocation.

Comparison with Other Processor Types

	Checker	Generator	Explicit
Products	one per input file	one per input file	one total
Outputs	none (pass/fail)	one per input	explicitly listed
Discovery	src_dirs + src_extensions	src_dirs + src_extensions	declared inputs/globs
Use case	lint/validate files	transform files 1:1	aggregate many → few

Gem Processor

Purpose

Installs Ruby dependencies from Gemfile files using Bundler.

How It Works

Discovers Gemfile files in the project, runs bundle install in each directory, and creates a stamp file on success. Sibling .rb and .gemspec files are tracked as inputs.

Source Files

Input: **/Gemfile (plus sibling .rb, .gemspec files)
Output: out/gem/{flat_name}.stamp

Configuration

[processor.gem]
command = "bundle"                     # The bundler command to run
args = []                              # Additional arguments to pass to bundler install
dep_inputs = []                      # Additional files that trigger rebuilds when changed
cache_output_dir = true                # Cache the vendor/bundle directory for fast restore after clean

Key	Type	Default	Description
`command`	string	`"bundle"`	The bundler executable to run
`args`	string[]	`[]`	Extra arguments passed to bundler install
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds
`cache_output_dir`	boolean	`true`	Cache the `vendor/bundle/` directory so `rsconstruct clean && rsconstruct build` restores from cache

Batch support

Runs as a single whole-project operation (e.g., cargo build, npm install).

Generator Processor

Purpose

Runs a user-configured script or command as a generator, producing output files from input files. The script receives input/output path pairs on the command line.

How It Works

Discovers files matching the configured extensions, computes output paths under output_dir with the configured output_extension, and invokes the command with path pairs.

In single mode: command [args...] <input> <output>

In batch mode: command [args...] <input1> <output1> <input2> <output2> ...

Auto-detected when the configured scan directories contain matching files.

Source Files

Input: files matching src_extensions in src_dirs
Output: {output_dir}/{relative_path}.{output_extension}

Configuration

[processor.generator]
command = "scripts/convert.py"
output_dir = "out/converted"
output_extension = "html"
src_dirs = ["syllabi"]
src_extensions = [".md"]
batch = true
args = []
dep_inputs = []

Key	Type	Default	Description
`command`	string	`"true"`	Script or command to run
`output_dir`	string	`"out/generator"`	Directory for output files
`output_extension`	string	`"out"`	Extension for output files
`batch`	bool	`true`	Pass all pairs in one invocation
`args`	string[]	`[]`	Extra arguments prepended before file pairs
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Configurable via batch = true (default). In batch mode, the script receives all input/output pairs in a single invocation. Set batch = false to invoke the script once per file.

Hadolint Processor

Purpose

Lints Dockerfiles using Hadolint.

How It Works

Discovers Dockerfile files in the project (excluding common build tool directories), runs hadolint on each file, and records success in the cache. A non-zero exit code from hadolint fails the product.

This processor supports batch mode.

Source Files

Input: **/Dockerfile
Output: none (checker)

Configuration

[processor.hadolint]
args = []
dep_inputs = []

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to hadolint
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

HTMLHint Processor

Purpose

Lints HTML files using HTMLHint.

How It Works

Discovers .html and .htm files in the project (excluding common build tool directories), runs htmlhint on each file, and records success in the cache. A non-zero exit code from htmlhint fails the product.

This processor supports batch mode.

If a .htmlhintrc file exists, it is automatically added as an extra input so that configuration changes trigger rebuilds.

Source Files

Input: **/*.html, **/*.htm
Output: none (checker)

Configuration

[processor.htmlhint]
command = "htmlhint"
args = []
dep_inputs = []

Key	Type	Default	Description
`command`	string	`"htmlhint"`	The htmlhint executable to run
`args`	string[]	`[]`	Extra arguments passed to htmlhint
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

HTMLLint Processor

Purpose

Lints HTML files using htmllint.

How It Works

Discovers .html and .htm files in the project (excluding common build tool directories), runs htmllint on each file, and records success in the cache. A non-zero exit code from htmllint fails the product.

This processor supports batch mode.

Source Files

Input: **/*.html, **/*.htm
Output: none (checker)

Configuration

[processor.htmllint]
args = []
dep_inputs = []

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to htmllint
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Imarkdown2html Processor

Purpose

Converts Markdown files to HTML using the pulldown-cmark Rust crate. Native (in-process, no external tools required).

This is the native equivalent of markdown2html, which uses the external markdown Perl script.

Source Files

Input: **/*.md
Output: out/imarkdown2html/{relative_path}.html

Configuration

[processor.imarkdown2html]
src_dirs = ["docs"]
output_dir = "out/imarkdown2html"    # Output directory (default)

Key	Type	Default	Description
`output_dir`	string	`"out/imarkdown2html"`	Output directory for HTML files
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch Support

Each input file is processed individually, producing its own output file.

Iyamlschema Processor

Purpose

Validates YAML files against JSON schemas referenced by a $schema URL field in each file. Checks both schema conformance and property ordering. Native (in-process, no external tools required).

How It Works

For each YAML file:

Parses the YAML content
Reads the $schema field to get the schema URL
Fetches the schema (cached in .rsconstruct/webcache.redb)
Validates the data against the schema (including resolving remote $ref references)
Checks that object keys appear in the order specified by propertyOrdering fields in the schema

Fails if any file is missing $schema, fails schema validation, or has keys in the wrong order.

Configuration

[processor.iyamlschema]
src_dirs = ["yaml"]
check_ordering = true    # Check propertyOrdering (default: true)

Key	Type	Default	Description
`check_ordering`	boolean	`true`	Whether to check property ordering against `propertyOrdering` in the schema
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Schema Requirements

Each YAML file must contain a $schema field with a URL pointing to a JSON schema:

$schema: "https://example.com/schemas/mydata.json"
name: Alice
age: 30

The schema is fetched via HTTP and cached locally. Subsequent builds use the cached version. Use rsconstruct webcache clear to force re-fetching.

Property Ordering

If the schema contains propertyOrdering arrays, the processor checks that data keys appear in the specified order:

{
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "age": { "type": "integer" }
  },
  "propertyOrdering": ["name", "age"]
}

Set check_ordering = false to disable this check.

Batch Support

Files are validated individually within a batch. Partial failure is handled correctly.

Jekyll Processor

Purpose

Builds Jekyll static sites by running jekyll build in directories containing a _config.yml file.

How It Works

Discovers _config.yml files in the project (excluding common build tool directories). For each one, runs jekyll build in that directory.

Source Files

Input: **/_config.yml
Output: none (creator)

Configuration

[processor.jekyll]
args = []
dep_inputs = []

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to jekyll build
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Runs as a single whole-project operation (e.g., cargo build, npm install).

Jinja2 Processor

Purpose

Renders Jinja2 template files into output files using the Python Jinja2 template library.

How It Works

Files matching configured extensions in templates.jinja2/ are rendered via python3 using the jinja2 Python library. Output is written with the extension stripped and the templates.jinja2/ prefix removed:

templates.jinja2/app.config.j2  →  app.config
templates.jinja2/sub/readme.txt.j2  →  sub/readme.txt

Templates use the Jinja2 templating engine. A FileSystemLoader is configured with the project root as the search directory, so templates can include or extend other templates using relative paths. Environment variables are passed to the template context.

Source Files

Input: templates.jinja2/**/*{src_extensions}
Output: project root, mirroring the template path (minus templates.jinja2/ prefix) with the extension removed

Configuration

[processor.jinja2]
src_extensions = [".j2"]                      # File extensions to process (default: [".j2"])
dep_inputs = ["config/settings.py"]     # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`src_extensions`	string[]	`[".j2"]`	File extensions to discover
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

Jq Processor

Purpose

Validates JSON files using jq.

How It Works

Discovers .json files in the project (excluding common build tool directories), runs jq empty on each file, and records success in the cache. The empty filter validates JSON syntax without producing output — a non-zero exit code from jq fails the product.

This processor supports batch mode — multiple files are checked in a single jq invocation.

Source Files

Input: **/*.json
Output: none (linter)

Configuration

[processor.jq]
command = "jq"                               # The jq command to run
args = []                                    # Additional arguments to pass to jq (after "empty")
dep_inputs = []                            # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`command`	string	`"jq"`	The jq executable to run
`args`	string[]	`[]`	Extra arguments passed to jq (after the `empty` filter)
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

JSHint Processor

Purpose

Lints JavaScript files using JSHint.

How It Works

Discovers .js, .jsx, .mjs, and .cjs files in the project (excluding common build tool directories), runs jshint on each file, and records success in the cache. A non-zero exit code from jshint fails the product.

This processor supports batch mode.

If a .jshintrc file exists, it is automatically added as an extra input so that configuration changes trigger rebuilds.

Source Files

Input: **/*.js, **/*.jsx, **/*.mjs, **/*.cjs
Output: none (checker)

Configuration

[processor.jshint]
command = "jshint"
args = []
dep_inputs = []

Key	Type	Default	Description
`command`	string	`"jshint"`	The jshint executable to run
`args`	string[]	`[]`	Extra arguments passed to jshint
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

JSLint Processor

Purpose

Lints JavaScript files using JSLint.

How It Works

Discovers .js files in the project (excluding common build tool directories), runs jslint on each file, and records success in the cache. A non-zero exit code from jslint fails the product.

This processor supports batch mode.

Source Files

Input: **/*.js
Output: none (checker)

Configuration

[processor.jslint]
args = []
dep_inputs = []

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to jslint
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Json Schema Processor

Purpose

Validates JSON schema files by checking that every object’s propertyOrdering array exactly matches its properties keys.

How It Works

Discovers .json files in the project (excluding common build tool directories), parses each as JSON, and recursively walks the structure. At every object node with "type": "object", if both properties and propertyOrdering exist, it verifies that the two key sets match exactly.

Mismatches (keys missing from propertyOrdering or extra keys in propertyOrdering) are reported with their JSON path. Files that contain no propertyOrdering at all pass silently.

This is a pure-Rust checker — no external tool is required.

Source Files

Input: **/*.json
Output: none (checker)

Configuration

[processor.json_schema]
args = []                                    # Reserved for future use
dep_inputs = []                            # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`args`	string[]	`[]`	Reserved for future use
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool processes one file at a time. Each file is checked in a separate invocation.

Jsonlint Processor

Purpose

Lints JSON files using jsonlint.

How It Works

Discovers .json files in the project (excluding common build tool directories), runs jsonlint on each file, and records success in the cache. A non-zero exit code from jsonlint fails the product.

This processor does not support batch mode — each file is checked individually.

Source Files

Input: **/*.json
Output: none (checker)

Configuration

[processor.jsonlint]
command = "jsonlint"                          # The jsonlint command to run
args = []                                    # Additional arguments to pass to jsonlint
dep_inputs = []                            # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`command`	string	`"jsonlint"`	The jsonlint executable to run
`args`	string[]	`[]`	Extra arguments passed to jsonlint
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Libreoffice Processor

Purpose

Converts LibreOffice documents (e.g., .odp presentations) to PDF or other formats.

How It Works

Discovers .odp files in the project and runs libreoffice in headless mode to convert each file to the configured output formats. Uses flock to serialize invocations since LibreOffice only supports a single running instance.

Source Files

Input: **/*.odp
Output: out/libreoffice/{format}/{relative_path}.{format}

Configuration

[processor.libreoffice]
libreoffice_bin = "libreoffice"        # The libreoffice command to run
formats = ["pdf"]                      # Output formats (pdf, pptx)
args = []                              # Additional arguments to pass to libreoffice
output_dir = "out/libreoffice"         # Output directory
dep_inputs = []                      # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`libreoffice_bin`	string	`"libreoffice"`	The libreoffice executable to run
`formats`	string[]	`["pdf"]`	Output formats to generate (`pdf`, `pptx`)
`args`	string[]	`[]`	Extra arguments passed to libreoffice
`output_dir`	string	`"out/libreoffice"`	Base output directory
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

Linux Module Processor

Purpose

Builds Linux kernel modules (.ko files) from source, driven by a linux-module.yaml manifest. The processor generates a temporary Kbuild file, invokes the kernel build system (make -C <kdir> M=<src> modules), copies the resulting .ko to the output directory, and cleans up build artifacts from the source tree.

How It Works

The processor scans for linux-module.yaml files. Each manifest lists one or more kernel modules to build. For each module the processor:

Generates a Kbuild file in the source directory (next to the yaml).
Runs make -C <kdir> M=<absolute-source-dir> modules to compile.
Copies the .ko file to out/linux-module/<yaml-relative-dir>/.
Runs make ... clean and removes the generated Kbuild so the source directory stays clean.

Because the kernel build system requires M= to point at an absolute path containing the sources and Kbuild, the make command runs in the yaml file’s directory — not the project root.

The processor is a generator: it knows exactly which .ko files it produces. Outputs are tracked in the build graph, cached in the object store, and can be restored from cache after rsconstruct clean without recompiling.

linux-module.yaml Format

All source paths are relative to the yaml file’s directory.

# Global settings (all optional)
make: make                    # Make binary (default: "make")
kdir: /lib/modules/6.8.0-generic/build  # Kernel build dir (default: running kernel)
arch: x86_64                  # ARCH= value (optional, omitted if unset)
cross_compile: x86_64-linux-gnu-  # CROSS_COMPILE= value (optional)
v: 0                          # Verbosity V= (default: 0)
w: 1                          # Warning level W= (default: 1)

# Module definitions
modules:
  - name: hello               # Module name -> produces hello.ko
    sources: [main.c]         # Source files (relative to yaml dir)
    extra_cflags: [-DDEBUG]   # Extra CFLAGS (optional, becomes ccflags-y)

  - name: mydriver
    sources: [mydriver.c, utils.c]

Minimal Example

A single module with one source file:

modules:
  - name: hello
    sources: [main.c]

Output Layout

Output is placed under out/linux-module/<yaml-relative-dir>/:

out/linux-module/<yaml-dir>/
  <module_name>.ko

For example, a manifest at src/kernel/hello/linux-module.yaml defining module hello produces:

out/linux-module/src/kernel/hello/hello.ko

KDIR Detection

If kdir is not set in the manifest, the processor runs uname -r to detect the running kernel and uses /lib/modules/<release>/build. This requires the linux-headers-* package to be installed (e.g., linux-headers-generic on Ubuntu).

Generated Kbuild

The processor writes a Kbuild file with the standard kernel module variables:

obj-m := hello.o
hello-objs := main.o
ccflags-y := -DDEBUG       # only if extra_cflags is non-empty

This file is removed after building (whether the build succeeds or fails).

Configuration

[processor.linux_module]
enabled = true           # Enable/disable (default: true)
dep_inputs = []        # Extra files that trigger rebuilds

Configuration Reference

Key	Type	Default	Description
`enabled`	bool	`true`	Enable/disable the processor
`dep_inputs`	string[]	`[]`	Extra files that trigger rebuilds when changed
`src_dirs`	string[]	`[""]`	Directory to scan for linux-module.yaml files
`src_extensions`	string[]	`["linux-module.yaml"]`	File patterns to scan for
`src_exclude_dirs`	string[]	common excludes	Directories to skip during scanning

Batch support

Each input file is processed individually, producing its own output file.

Caching

The .ko outputs are cached in the rsconstruct object store. After rsconstruct clean, a subsequent rsconstruct build restores .ko files from cache (via hardlink or copy) without invoking the kernel build system. A rebuild is triggered when any source file or the yaml manifest changes.

Prerequisites

make must be installed
Kernel headers must be installed for the target kernel version (apt install linux-headers-generic on Ubuntu)
For cross-compilation, the appropriate cross-compiler toolchain must be available and specified via cross_compile and arch in the manifest

Example

Given this project layout:

myproject/
  rsconstruct.toml
  drivers/
    hello/
      linux-module.yaml
      main.c

With drivers/hello/linux-module.yaml:

modules:
  - name: hello
    sources: [main.c]

And drivers/hello/main.c:

#include <linux/module.h>
#include <linux/init.h>

MODULE_LICENSE("GPL");

static int __init hello_init(void) {
    pr_info("hello: loaded\n");
    return 0;
}

static void __exit hello_exit(void) {
    pr_info("hello: unloaded\n");
}

module_init(hello_init);
module_exit(hello_exit);

Running rsconstruct build produces:

out/linux-module/drivers/hello/hello.ko

The module can then be loaded with sudo insmod out/linux-module/drivers/hello/hello.ko.

Luacheck Processor

Purpose

Lints Lua scripts using luacheck.

How It Works

Discovers .lua files in the project (excluding common build tool directories), runs luacheck on each file, and records success in the cache. A non-zero exit code from luacheck fails the product.

This processor supports batch mode, allowing multiple files to be checked in a single luacheck invocation for better performance.

Source Files

Input: **/*.lua
Output: none (linter)

Configuration

[processor.luacheck]
command = "luacheck"                         # The luacheck command to run
args = []                                    # Additional arguments to pass to luacheck
dep_inputs = []                            # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`command`	string	`"luacheck"`	The luacheck executable to run
`args`	string[]	`[]`	Extra arguments passed to luacheck
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Make Processor

Purpose

Runs make in directories containing Makefiles. Each Makefile produces a stub file on success, allowing RSConstruct to track incremental rebuilds.

How It Works

Discovers files named Makefile in the project. For each Makefile found, the processor runs make (or a configured alternative) in the Makefile’s directory. A stub file is created on success.

Directory-Level Inputs

The make processor treats all files in the Makefile’s directory (and subdirectories) as inputs. This means that if any file alongside the Makefile changes — source files, headers, scripts, included makefiles — rsconstruct will re-run make.

This is slightly conservative: a change to a file that the Makefile does not actually depend on will trigger a rebuild. In practice this is the right trade-off because Makefiles can depend on arbitrary files and there is no reliable way to know which ones without running make itself.

Source Files

Input: **/Makefile plus all files in the Makefile’s directory tree
Output: out/make/{relative_path}.done

Dependency Tracking Approaches

RSConstruct uses the directory-scan approach described above. Here is why, and what the alternatives are.

1. Directory scan (current)

Track every file under the Makefile’s directory as an input. Any change triggers a rebuild.

Pros: simple, correct, zero configuration. Cons: over-conservative — a change to an unrelated file in the same directory triggers a needless rebuild.

2. User-declared extra inputs

The user lists specific files or globs in dep_inputs. Only those files (plus the Makefile itself) are tracked.

Pros: precise, no unnecessary rebuilds. Cons: requires the user to manually maintain the list. Easy to forget a file and get stale builds.

This is available today via the dep_inputs config key, but on its own it would miss source files that the Makefile compiles.

3. Parse `make --dry-run --print-data-base`

Ask make to dump its dependency database and extract the real inputs.

Pros: exact dependency information, no over-building. Cons: fragile — output format varies across make implementations (GNU Make, BSD Make, nmake). Some Makefiles behave differently in dry-run mode. Complex to implement and maintain.

4. Hash the directory tree

Instead of listing individual files, compute a single hash over every file in the directory. Functionally equivalent to option 1 but with a different internal representation.

Pros: compact cache key. Cons: same over-conservatism as option 1, and no ability to report which file changed.

Configuration

[processor.make]
command = "make"     # Make binary to use
args = []            # Extra arguments passed to make
target = ""          # Make target (empty = default target)
src_dirs = [""]        # Directory to scan ("" = project root)
src_extensions = ["Makefile"]
dep_inputs = []    # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`command`	string	`"make"`	Path or name of the make binary
`args`	string[]	`[]`	Extra arguments passed to every make invocation
`target`	string	`""`	Make target to build (empty = default target)
`src_dirs`	string[]	`[""]`	Directory to scan for Makefiles
`src_extensions`	string[]	`["Makefile"]`	File names to match
`src_exclude_paths`	string[]	`[]`	Paths (relative to project root) to exclude
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds (in addition to directory contents)

Batch support

The tool processes one file at a time. Each file is checked in a separate invocation.

Mako Processor

Purpose

Renders Mako template files into output files using the Python Mako template library.

How It Works

Files matching configured extensions in templates.mako/ are rendered via python3 using the mako Python library. Output is written with the extension stripped and the templates.mako/ prefix removed:

templates.mako/app.config.mako  →  app.config
templates.mako/sub/readme.txt.mako  →  sub/readme.txt

Templates use the Mako templating engine. A TemplateLookup is configured with the project root as the lookup directory, so templates can include or inherit from other templates using relative paths.

Source Files

Input: templates.mako/**/*{src_extensions}
Output: project root, mirroring the template path (minus templates.mako/ prefix) with the extension removed

Configuration

[processor.mako]
src_extensions = [".mako"]                    # File extensions to process (default: [".mako"])
dep_inputs = ["config/settings.py"]     # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`src_extensions`	string[]	`[".mako"]`	File extensions to discover
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

Markdown2html Processor

Purpose

Converts Markdown files to HTML using the markdown Perl script.

How It Works

Discovers .md files in the project and runs markdown on each file, producing an HTML output file.

Source Files

Input: **/*.md
Output: out/markdown2html/{relative_path}.html

Configuration

[processor.markdown2html]
markdown_bin = "markdown"              # The markdown command to run
args = []                              # Additional arguments to pass to markdown
output_dir = "out/markdown2html"       # Output directory
dep_inputs = []                      # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`markdown_bin`	string	`"markdown"`	The markdown executable to run
`args`	string[]	`[]`	Extra arguments passed to markdown
`output_dir`	string	`"out/markdown2html"`	Output directory for HTML files
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

MassGenerator Processor

Status

Designed, not yet implemented. This document describes the intended user-facing contract for the MassGenerator processor type. The full design rationale is in Output Prediction.

Why “mass generator”?

Existing processor types cover a matrix of “how many outputs” and “are they known in advance”:

Type	Outputs known?	Example
Generator	Yes, 1 per input	tera: template → file
Explicit	Yes, user-declared	custom build step
Checker	None (pass/fail)	ruff
Creator	No, opaque (`output_dirs`)	mkdocs → `_site/`
MassGenerator	Yes — tool enumerates them	rssite → `_site/*`

MassGenerator is the “transparent Creator”: it produces many output files (like a Creator), but the tool itself answers the question “what will you produce?” before running. Each predicted file becomes a declared product with its own inputs, cache entry, and dependency edges.

Names considered:

mass_generator — chosen. Says what it does: “generator” (per-file outputs like the Generator type), “mass” (many products from one tool invocation).
transparent_creator — accurate but awkward.
predicting_creator — describes the mechanism, not the result.
site_generator — too narrow; the type is useful beyond static sites.

Purpose

Wraps a tool that:

Produces many output files from a set of source files (e.g., a static site generator).
Can enumerate its outputs in advance via a separate “plan” command.
Normally builds all its outputs in a single invocation.

Once wired as a MassGenerator, the tool gets per-file cache entries, plays cleanly with other processors sharing its output directory, and allows downstream processors to depend on its outputs.

How it works

1. The tool provides two modes

The wrapped tool must expose:

Build mode: runs the actual generation. Produces all output files in one invocation.
Plan mode: prints a JSON manifest to stdout listing every output it will produce, with per-output source dependencies. Does not produce any output files.

Both modes must be driven by the same internal function that enumerates outputs — otherwise the plan and the build diverge, and the cache is corrupted. This is a discipline the tool author upholds.

2. Plan phase (at graph-build time)

rsconstruct runs predict_command and parses its output. For each entry in the manifest, a product is added to the build graph with:

inputs = the entry’s sources (files whose changes should trigger this output’s rebuild)
outputs = [entry.path]
processor = the MassGenerator instance name

3. Build phase

rsconstruct groups all dirty products for a MassGenerator instance into a single batch. The tool’s command is invoked once per batch; it produces all predicted files. Each product caches its own file as a blob, independently of the others.

In strict mode (default), after the tool exits rsconstruct verifies that every predicted file was produced and no unexpected files appeared in output_dirs. Mismatches are build-breaking errors.

4. Restore phase

When all products for a MassGenerator instance are clean, each is restored from its blob cache — the tool is not invoked at all. Partial cleanliness (some products clean, some dirty) triggers a single tool invocation, and clean products are cached/re-cached afterward.

Manifest format

{
  "version": 1,
  "outputs": [
    {
      "path": "_site/index.html",
      "sources": ["docs/index.md", "templates/default.html", "mysite.toml"]
    },
    {
      "path": "_site/about/index.html",
      "sources": ["docs/about.md", "templates/default.html", "mysite.toml"]
    }
  ]
}

version — integer. Schema version. Current: 1.
outputs[].path — relative path. Must fall within one of the processor’s output_dirs.
outputs[].sources — minimal set of input files whose changes invalidate this output.

Configuration

[processor.mass_generator.site]
command         = "rssite build"
predict_command = "rssite plan"
output_dirs     = ["_site"]
src_dirs        = ["docs", "templates"]
src_extensions  = [".md", ".html", ".yaml"]
# loose_manifest = false   # optional; set to true to downgrade verification mismatches to warnings

Fields

Key	Type	Required	Description
`command`	string	yes	Tool’s build command. Invoked once per batch of dirty products.
`predict_command`	string	yes	Tool’s plan command. Must print JSON manifest to stdout.
`output_dirs`	array of strings	yes	Directories the tool produces files in. Used for verification.
`loose_manifest`	bool	no	Default false. If true, plan/actual mismatches are warnings only.
`src_dirs`	array of strings	no	Bound which source changes trigger a replan.
`src_extensions`	array of strings	no	As above.
`src_exclude_*`	array of strings	no	Standard scan exclusions apply.
`dep_inputs`	array of strings	no	Extra files that invalidate the whole instance when changed.

Cross-processor dependencies

Because every output file is a declared product, downstream processors wire up naturally:

[processor.mass_generator.site]
command         = "rssite build"
predict_command = "rssite plan"
output_dirs     = ["_site"]

[processor.markdownlint]
# Depends on rssite's outputs automatically via file-scan:
# any _site/*.html file is a discovered virtual file in the graph.
src_dirs       = ["_site"]
src_extensions = [".html"]

No ordering hacks needed. The graph’s topological sort handles it.

Tool author contract

For a tool to be compatible with MassGenerator, its plan command must uphold these invariants:

Pure function of config + source tree. Same inputs → same manifest, bit for bit. No network, no timestamps, no env-var peeking (unless declared as a source).
Cheap or cached. rsconstruct invokes it on every graph build. Slow plan → slow rsconstruct.
Exact match with build output. Predicted paths must equal actual paths produced by command. Violations are errors in strict mode.
Deterministic variable outputs. Content-derived outputs (tag pages, archive indices, RSS) must be enumerable from the same parsing pass that plan does.

See rssite for a reference tool being built to this contract.

Comparison with other processor types

	Creator (opaque)	MassGenerator (transparent)	Generator (1:1)
Outputs known in advance?	No	Yes	Yes
Tool invocations per build	1 if dirty	1 if any product is dirty	N (one per dirty input)
Cache unit	Whole tree	Per file	Per file
Downstream deps	Only on declared files	On every predicted file	On every produced file
Shared-folder safety	Via `path_owner` filter	Via declared outputs (normal)	Via declared outputs
Use case	mkdocs, Sphinx	rssite, cooperative tools	tera, mako, compilers

Migration story

If a tool exists first as a Creator (output_dirs only) and later adds plan support, the migration is config-only:

# Before
[processor.creator.mysite]
command     = "mysite build"
output_dirs = ["_site"]

# After
[processor.mass_generator.mysite]
command         = "mysite build"
predict_command = "mysite plan"
output_dirs     = ["_site"]

No code changes; existing downstream processors start getting precise dependencies automatically.

Markdownlint Processor

Purpose

Lints Markdown files using markdownlint (Node.js).

How It Works

Discovers .md files in the project and runs markdownlint on each file. A non-zero exit code fails the product.

Depends on the npm processor — uses the markdownlint binary installed by npm.

Source Files

Input: **/*.md
Output: none (checker)

Configuration

[processor.markdownlint]
command = "node_modules/.bin/markdownlint"  # Path to the markdownlint binary
args = []                              # Additional arguments to pass to markdownlint
npm_stamp = "out/npm/root.stamp"       # Stamp file from npm processor (dependency)
dep_inputs = []                      # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`command`	string	`"node_modules/.bin/markdownlint"`	Path to the markdownlint executable
`args`	string[]	`[]`	Extra arguments passed to markdownlint
`npm_stamp`	string	`"out/npm/root.stamp"`	Stamp file from npm processor (ensures npm packages are installed first)
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool processes one file at a time. Each file is checked in a separate invocation.

Marp Processor

Purpose

Converts Markdown slides to PDF, PPTX, or HTML using Marp.

How It Works

Discovers .md files in the project and runs marp on each file, generating output in the configured formats. Each format produces a separate output file.

Each marp invocation spawns a headless Chromium browser instance via Puppeteer to render the slides. This makes marp significantly more resource-intensive than typical processors — see Concurrency limiting below.

Source Files

Input: **/*.md
Output: out/marp/{format}/{relative_path}.{format}

Configuration

[processor.marp]
marp_bin = "marp"                      # The marp command to run
formats = ["pdf"]                      # Output formats (pdf, pptx, html)
args = ["--html", "--allow-local-files"]  # Additional arguments to pass to marp
output_dir = "out/marp"                # Output directory
dep_inputs = []                      # Additional files that trigger rebuilds when changed
max_jobs = 2                           # Limit concurrent marp instances (each spawns Chromium)

Key	Type	Default	Description
`marp_bin`	string	`"marp"`	The marp executable to run
`formats`	string[]	`["pdf"]`	Output formats to generate (`pdf`, `pptx`, `html`)
`args`	string[]	`["--html", "--allow-local-files"]`	Extra arguments passed to marp
`output_dir`	string	`"out/marp"`	Base output directory
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds
`max_jobs`	integer	none	Max concurrent marp processes. See Concurrency limiting.

Concurrency Limiting

Each marp invocation launches a full headless Chromium browser process, which consumes hundreds of megabytes of RAM. When running parallel builds with -j N, too many simultaneous Chromium instances cause resource exhaustion and non-deterministic crashes:

TargetCloseError: Protocol error (Target.setDiscoverTargets): Target closed

Use max_jobs to limit how many marp processes run concurrently, independent of the global -j setting. For example, with -j 20 and max_jobs = 2, at most 2 Chromium instances will be alive at once while other processors still use the full 20 threads:

[processor.marp]
formats = ["pdf"]
max_jobs = 2

Recommended value: 2. A value of 4 may work on machines with plenty of RAM but has been observed to produce occasional failures on large projects (700+ slides). Without max_jobs, the global -j value applies, which typically causes crashes at higher parallelism levels.

Batch Support

Each input file is processed individually, producing its own output file.

Temporary Files

Marp creates temporary Chromium profile directories (marp-cli-*) in /tmp for each invocation. RSConstruct automatically cleans these up after each marp process completes, since marp itself does not delete them.

Mdbook Processor

Purpose

Builds mdbook documentation projects.

How It Works

Discovers book.toml files indicating mdbook projects, collects sibling .md and .toml files as inputs, and runs mdbook build. A non-zero exit code fails the product.

Source Files

Input: **/book.toml (plus sibling .md, .toml files)
Output: none (creator — produces output in book directory)

Configuration

[processor.mdbook]
command = "mdbook"                     # The mdbook command to run
output_dir = "book"                    # Output directory for generated docs
args = []                              # Additional arguments to pass to mdbook
dep_inputs = []                      # Additional files that trigger rebuilds when changed
cache_output_dir = true                # Cache the output directory for fast restore after clean

Key	Type	Default	Description
`command`	string	`"mdbook"`	The mdbook executable to run
`output_dir`	string	`"book"`	Output directory for generated documentation
`args`	string[]	`[]`	Extra arguments passed to mdbook
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds
`cache_output_dir`	boolean	`true`	Cache the `book/` directory so `rsconstruct clean && rsconstruct build` restores from cache

Batch support

Runs as a single whole-project operation (e.g., cargo build, npm install).

Mdl Processor

Purpose

Lints Markdown files using mdl (Ruby markdownlint).

How It Works

Discovers .md files in the project and runs mdl on each file. A non-zero exit code fails the product.

Depends on the gem processor — uses the mdl binary installed by Bundler.

Source Files

Input: **/*.md
Output: none (checker)

Configuration

[processor.mdl]
gem_home = "gems"                      # GEM_HOME directory
command = "gems/bin/mdl"              # Path to the mdl binary
args = []                              # Additional arguments to pass to mdl
gem_stamp = "out/gem/root.stamp"       # Stamp file from gem processor (dependency)
dep_inputs = []                      # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`gem_home`	string	`"gems"`	GEM_HOME directory for Ruby gems
`command`	string	`"gems/bin/mdl"`	Path to the mdl executable
`args`	string[]	`[]`	Extra arguments passed to mdl
`gem_stamp`	string	`"out/gem/root.stamp"`	Stamp file from gem processor (ensures gems are installed first)
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool processes one file at a time. Each file is checked in a separate invocation.

Mermaid Processor

Purpose

Converts Mermaid diagram files to PNG, SVG, or PDF using mmdc (mermaid-cli).

How It Works

Discovers .mmd files in the project and runs mmdc on each file, generating output in the configured formats. Each format produces a separate output file.

Source Files

Input: **/*.mmd
Output: out/mermaid/{format}/{relative_path}.{format}

Configuration

[processor.mermaid]
mmdc_bin = "mmdc"                      # The mmdc command to run
formats = ["png"]                      # Output formats (png, svg, pdf)
args = []                              # Additional arguments to pass to mmdc
output_dir = "out/mermaid"             # Output directory
dep_inputs = []                      # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`mmdc_bin`	string	`"mmdc"`	The mermaid-cli executable to run
`formats`	string[]	`["png"]`	Output formats to generate (`png`, `svg`, `pdf`)
`args`	string[]	`[]`	Extra arguments passed to mmdc
`output_dir`	string	`"out/mermaid"`	Base output directory
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

Mypy Processor

Purpose

Type-checks Python source files using mypy.

How It Works

Discovers .py files in the project (excluding common non-source directories), runs mypy on each file, and creates a stub file on success. A non-zero exit code from mypy fails the product.

This processor supports batch mode, allowing multiple files to be checked in a single mypy invocation for better performance.

If a mypy.ini file exists in the project root, it is automatically added as an extra input so that configuration changes trigger rebuilds.

Source Files

Input: **/*.py
Output: out/mypy/{flat_name}.mypy

Configuration

[processor.mypy]
command = "mypy"                             # The mypy command to run
args = []                                    # Additional arguments to pass to mypy
dep_inputs = []                            # Additional files that trigger rebuilds (e.g. ["pyproject.toml"])

Key	Type	Default	Description
`command`	string	`"mypy"`	The mypy executable to run
`args`	string[]	`[]`	Extra arguments passed to mypy
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Using mypy.ini

Mypy automatically reads configuration from a mypy.ini file in the project root. This file is detected automatically and added as an extra input, so changes to it will trigger rebuilds without manual configuration.

Npm Processor

Purpose

Installs Node.js dependencies from package.json files using npm.

How It Works

Discovers package.json files in the project, runs npm install in each directory, and creates a stamp file on success. Sibling .json, .js, and .ts files are tracked as inputs so changes trigger reinstallation.

Source Files

Input: **/package.json (plus sibling .json, .js, .ts files)
Output: out/npm/{flat_name}.stamp

Configuration

[processor.npm]
command = "npm"                        # The npm command to run
args = []                              # Additional arguments to pass to npm install
dep_inputs = []                      # Additional files that trigger rebuilds when changed
cache_output_dir = true                # Cache the node_modules directory for fast restore after clean

Key	Type	Default	Description
`command`	string	`"npm"`	The npm executable to run
`args`	string[]	`[]`	Extra arguments passed to npm install
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds
`cache_output_dir`	boolean	`true`	Cache the `node_modules/` directory so `rsconstruct clean && rsconstruct build` restores from cache

Batch support

Runs as a single whole-project operation (e.g., cargo build, npm install).

Objdump Processor

Purpose

Disassembles ELF binaries using objdump.

How It Works

Discovers .elf files under out/cc_single_file/, runs objdump to produce disassembly output, and writes the result to the configured output directory.

Source Files

Input: out/cc_single_file/**/*.elf
Output: disassembly files in output directory

Configuration

[processor.objdump]
args = []
dep_inputs = []
output_dir = "out/objdump"

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to objdump
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds
`output_dir`	string	`"out/objdump"`	Directory for disassembly output

Batch support

Each input file is processed individually, producing its own output file.

Pandoc Processor

Purpose

Converts documents between formats using pandoc.

How It Works

Discovers .md files in the project and runs pandoc on each file, converting from the configured source format to the configured output formats.

Source Files

Input: **/*.md
Output: out/pandoc/{format}/{relative_path}.{format}

Configuration

[processor.pandoc]
pandoc = "pandoc"                      # The pandoc command to run
from = "markdown"                      # Source format
formats = ["pdf"]                      # Output formats (pdf, docx, html, etc.)
args = []                              # Additional arguments to pass to pandoc
output_dir = "out/pandoc"              # Output directory
dep_inputs = []                      # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`pandoc`	string	`"pandoc"`	The pandoc executable to run
`from`	string	`"markdown"`	Source format
`formats`	string[]	`["pdf"]`	Output formats to generate
`args`	string[]	`[]`	Extra arguments passed to pandoc
`output_dir`	string	`"out/pandoc"`	Base output directory
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

Pdflatex Processor

Purpose

Compiles LaTeX documents to PDF using pdflatex.

How It Works

Discovers .tex files in the project and runs pdflatex on each file. Runs multiple compilation passes (configurable) to resolve cross-references and table of contents. Optionally uses qpdf to linearize the output PDF.

Source Files

Input: **/*.tex
Output: out/pdflatex/{relative_path}.pdf

Configuration

[processor.pdflatex]
command = "pdflatex"                   # The pdflatex command to run
runs = 2                               # Number of compilation passes
qpdf = true                           # Use qpdf to linearize output PDF
args = []                              # Additional arguments to pass to pdflatex
output_dir = "out/pdflatex"            # Output directory
dep_inputs = []                      # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`command`	string	`"pdflatex"`	The pdflatex executable to run
`runs`	integer	`2`	Number of compilation passes (for cross-references)
`qpdf`	bool	`true`	Use qpdf to linearize the output PDF
`args`	string[]	`[]`	Extra arguments passed to pdflatex
`output_dir`	string	`"out/pdflatex"`	Output directory for PDF files
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

Pdfunite Processor

Purpose

Merges PDF files from subdirectories into single combined PDFs using pdfunite.

How It Works

Scans subdirectories of the configured source directory for files matching the configured extension. For each subdirectory, it locates the corresponding PDFs (generated by an upstream processor such as marp) and merges them into a single output PDF.

This processor is designed for course/module workflows where slide decks in subdirectories are combined into course bundles.

Source Files

Input: PDFs from upstream processor (e.g., out/marp/pdf/{subdir}/*.pdf)
Output: out/courses/{subdir}.pdf

Configuration

[processor.pdfunite]
command = "pdfunite"                   # The pdfunite command to run
source_dir = "marp/courses"           # Base directory containing course subdirectories
source_ext = ".md"                     # Extension of source files in subdirectories
source_output_dir = "out/marp/pdf"     # Where the upstream processor puts PDFs
args = []                              # Additional arguments to pass to pdfunite
output_dir = "out/courses"             # Output directory for merged PDFs
dep_inputs = []                      # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`command`	string	`"pdfunite"`	The pdfunite executable to run
`source_dir`	string	`"marp/courses"`	Directory containing course subdirectories
`source_ext`	string	`".md"`	Extension of source files to look for
`source_output_dir`	string	`"out/marp/pdf"`	Directory where the upstream processor outputs PDFs
`args`	string[]	`[]`	Extra arguments passed to pdfunite
`output_dir`	string	`"out/courses"`	Output directory for merged PDFs
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

Perlcritic Processor

Purpose

Analyzes Perl code using Perl::Critic.

How It Works

Discovers .pl and .pm files in the project (excluding common build tool directories), runs perlcritic on each file, and records success in the cache. A non-zero exit code from perlcritic fails the product.

This processor supports batch mode.

If a .perlcriticrc file exists, it is automatically added as an extra input so that configuration changes trigger rebuilds.

Source Files

Input: **/*.pl, **/*.pm
Output: none (checker)

Configuration

[processor.perlcritic]
args = []
dep_inputs = []

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to perlcritic
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

PHP Lint Processor

Purpose

Checks PHP syntax using php -l.

How It Works

Discovers .php files in the project (excluding common build tool directories), runs php -l on each file, and records success in the cache. A non-zero exit code fails the product.

This processor supports batch mode.

Source Files

Input: **/*.php
Output: none (checker)

Configuration

[processor.php_lint]
args = []
dep_inputs = []

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to php
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Pip Processor

Purpose

Installs Python dependencies from requirements.txt files using pip.

How It Works

Discovers requirements.txt files in the project, runs pip install -r on each, and creates a stamp file on success. The stamp file tracks the install state so dependencies are only reinstalled when requirements.txt changes.

Source Files

Input: **/requirements.txt
Output: out/pip/{flat_name}.stamp

Configuration

[processor.pip]
command = "pip"                        # The pip command to run
args = []                              # Additional arguments to pass to pip
dep_inputs = []                      # Additional files that trigger rebuilds when changed
cache_output_dir = true                # Cache the stamp directory for fast restore after clean

Key	Type	Default	Description
`command`	string	`"pip"`	The pip executable to run
`args`	string[]	`[]`	Extra arguments passed to pip
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds
`cache_output_dir`	boolean	`true`	Cache the `out/pip/` directory so `rsconstruct clean && rsconstruct build` restores from cache

Batch support

Runs as a single whole-project operation (e.g., cargo build, npm install).

Protobuf Processor

Purpose

Compiles Protocol Buffer (.proto) files to generated source code using protoc.

How It Works

Files matching configured extensions in the proto/ directory are compiled using the Protocol Buffer compiler. Output is written to out/protobuf/:

proto/hello.proto  →  out/protobuf/hello.pb.cc

The --proto_path is automatically set to the parent directory of each input file.

Source Files

Input: proto/**/*.proto
Output: out/protobuf/ with .pb.cc extension

Configuration

[processor.protobuf]
protoc_bin = "protoc"                     # Protoc binary (default: "protoc")
src_extensions = [".proto"]                   # File extensions to process
output_dir = "out/protobuf"              # Output directory (default: "out/protobuf")
dep_inputs = []                         # Additional files that trigger rebuilds

Key	Type	Default	Description
`protoc_bin`	string	`"protoc"`	Path to protoc compiler
`src_extensions`	string[]	`[".proto"]`	File extensions to discover
`output_dir`	string	`"out/protobuf"`	Output directory
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

Pylint Processor

Purpose

Lints Python source files using pylint.

How It Works

Discovers .py files in the project (excluding common non-source directories), runs pylint on each file, and creates a stub file on success. A non-zero exit code from pylint fails the product.

This processor supports batch mode, allowing multiple files to be checked in a single pylint invocation for better performance.

If a .pylintrc file exists in the project root, it is automatically added as an extra input so that configuration changes trigger rebuilds.

Source Files

Input: **/*.py
Output: out/pylint/{flat_name}.pylint

Configuration

[processor.pylint]
args = []                                  # Additional arguments to pass to pylint
dep_inputs = []                          # Additional files that trigger rebuilds (e.g. ["pyproject.toml"])

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to pylint
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Pyrefly Processor

Purpose

Type-checks Python source files using pyrefly.

How It Works

Discovers .py files in the project (excluding common non-source directories), runs pyrefly check on each file, and records success in the cache. A non-zero exit code from pyrefly fails the product.

This processor supports batch mode, allowing multiple files to be checked in a single pyrefly invocation for better performance.

Source Files

Input: **/*.py
Output: none (linter)

Configuration

[processor.pyrefly]
command = "pyrefly"                          # The pyrefly command to run
args = []                                    # Additional arguments to pass to pyrefly
dep_inputs = []                            # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`command`	string	`"pyrefly"`	The pyrefly executable to run
`args`	string[]	`[]`	Extra arguments passed to pyrefly
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Pytest Processor

Purpose

Runs Python test files using pytest to verify they pass.

How It Works

Python test files (.py) in the tests/ directory are run using pytest. Each test file is checked individually — a failing test causes the build to fail.

Source Files

Input: tests/**/*.py
Output: none (checker — pass/fail only)

Configuration

[processor.pytest]
src_extensions = [".py"]                      # File extensions to process (default: [".py"])
src_dirs = ["tests"]                     # Directories to scan (default: ["tests"])
dep_inputs = []                         # Additional files that trigger rebuilds

Key	Type	Default	Description
`src_extensions`	string[]	`[".py"]`	File extensions to discover
`src_dirs`	string[]	`["tests"]`	Directories to scan for test files
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Requirements Processor

Purpose

Generates a requirements.txt file for a Python project by scanning the project’s .py source files for import statements and listing the third-party PyPI distributions they reference.

How It Works

Scans every .py file in the project’s source directories.
Extracts the top-level module name from each import / from statement.
Drops imports that resolve to a local project file (intra-project imports).
Drops imports that are part of the Python standard library.
Drops imports listed in exclude.
Maps each remaining import name to its PyPI distribution name using the built-in curated table (e.g. cv2 → opencv-python, yaml → PyYAML). User-supplied mapping entries win over the built-in table.
Writes the deduplicated result to requirements.txt.

Import → Distribution Mapping

Most Python packages publish under the same name as their top-level import, so the default is identity (import requests → requests). A curated table handles the common exceptions:

Import	Distribution
`cv2`	`opencv-python`
`yaml`	`PyYAML`
`PIL`	`Pillow`
`sklearn`	`scikit-learn`
`bs4`	`beautifulsoup4`
`dateutil`	`python-dateutil`
`dotenv`	`python-dotenv`
`jwt`	`PyJWT`

Projects that import an unusual name should add an override:

[processor.requirements.mapping]
internal_tools = "acme-internal-tools"

Limitations

No version pinning. The generated file lists bare distribution names. Running pip freeze > requirements.txt is the right tool if you need pinned versions.
Static analysis only. Conditional imports inside try blocks, runtime __import__ calls, and string-based imports are not detected.
Curated mapping is finite. Packages with import/distribution name mismatches not in the built-in table default to identity; add them to mapping when needed.

Source Files

Input: **/*.py (configurable via src_dirs / src_extensions)
Output: requirements.txt (configurable via output)

Configuration

[processor.requirements]
output = "requirements.txt"    # Output file path
exclude = []                   # Import names to never emit
sorted = true                  # Sort entries alphabetically
header = true                  # Include a "# Generated by rsconstruct" header

[processor.requirements.mapping]
# Per-project overrides: import_name = "pypi-distribution-name"
# These win over the built-in curated table.

Key	Type	Default	Description
`output`	string	`"requirements.txt"`	Output file path
`exclude`	string[]	`[]`	Import names to never emit
`sorted`	bool	`true`	Sort entries alphabetically (false preserves first-seen order)
`header`	bool	`true`	Include a comment header line
`mapping`	map	`{}`	Per-project import→distribution overrides

Batch support

Runs as a single whole-project operation — all .py files feed into one requirements.txt output.

Ruff Processor

Purpose

Lints Python source files using ruff.

How It Works

Discovers .py files in the project (excluding common non-source directories), runs ruff check on each file, and creates a stub file on success. A non-zero exit code from ruff fails the product.

This processor supports batch mode, allowing multiple files to be checked in a single ruff invocation for better performance.

Source Files

Input: **/*.py
Output: out/ruff/{flat_name}.ruff

Configuration

[processor.ruff]
command = "ruff"                            # The ruff command to run
args = []                                  # Additional arguments to pass to ruff
dep_inputs = []                          # Additional files that trigger rebuilds (e.g. ["pyproject.toml"])

Key	Type	Default	Description
`command`	string	`"ruff"`	The ruff executable to run
`args`	string[]	`[]`	Extra arguments passed to ruff
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Rumdl Processor

Purpose

Lints Markdown files using rumdl.

How It Works

Discovers .md files in the project (excluding common non-source directories), runs rumdl check on each file, and creates a stub file on success. A non-zero exit code from rumdl fails the product.

This processor supports batch mode, allowing multiple files to be checked in a single rumdl invocation for better performance.

Source Files

Input: **/*.md
Output: out/rumdl/{flat_name}.rumdl

Configuration

[processor.rumdl]
command = "rumdl"                             # The rumdl command to run
args = []                                    # Additional arguments to pass to rumdl
dep_inputs = []                            # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`command`	string	`"rumdl"`	The rumdl executable to run
`args`	string[]	`[]`	Extra arguments passed to rumdl
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Rust Single File Processor

Purpose

Compiles single-file Rust programs (.rs) into executables, similar to the cc_single_file processor but for Rust.

How It Works

Rust source files in the src/ directory are compiled directly to executables using rustc. This is useful for exercise, example, or utility repositories where each .rs file is a standalone program.

Output is written to out/rust_single_file/ preserving the directory structure:

src/hello.rs  →  out/rust_single_file/hello.elf
src/exercises/ex1.rs  →  out/rust_single_file/exercises/ex1.elf

Source Files

Input: src/**/*.rs
Output: out/rust_single_file/ with configured suffix (default: .elf)

Configuration

[processor.rust_single_file]
command = "rustc"                         # Rust compiler (default: "rustc")
flags = []                                # Additional compiler flags
output_suffix = ".elf"                    # Output file suffix (default: ".elf")
output_dir = "out/rust_single_file"       # Output directory
dep_inputs = []                         # Additional files that trigger rebuilds

Key	Type	Default	Description
`command`	string	`"rustc"`	Path to Rust compiler
`flags`	string[]	`[]`	Additional compiler flags
`output_suffix`	string	`".elf"`	Suffix for output executables
`output_dir`	string	`"out/rust_single_file"`	Output directory
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

Sass Processor

Purpose

Compiles SCSS and SASS files into CSS using the Sass compiler.

How It Works

Files matching configured extensions in the sass/ directory are compiled to CSS. Output is written to out/sass/ preserving the directory structure:

sass/style.scss  ->  out/sass/style.css
sass/components/button.scss  ->  out/sass/components/button.css

Source Files

Input: sass/**/*{src_extensions}
Output: out/sass/ mirroring the source structure with .css extension

Configuration

[processor.sass]
sass_bin = "sass"                         # Sass compiler binary (default: "sass")
src_extensions = [".scss", ".sass"]           # File extensions to process
output_dir = "out/sass"                   # Output directory (default: "out/sass")
dep_inputs = []                         # Additional files that trigger rebuilds

Key	Type	Default	Description
`sass_bin`	string	`"sass"`	Path to sass compiler
`src_extensions`	string[]	`[".scss", ".sass"]`	File extensions to discover
`output_dir`	string	`"out/sass"`	Output directory
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

Script Processor

Purpose

Runs a user-configured script or command as a linter on discovered files. This is a generic linter that lets you plug in any script without writing a custom processor.

How It Works

Discovers files matching the configured extensions in the configured scan directory, then runs the configured linter command on each file (or batch of files). A non-zero exit code from the script fails the product.

This processor is disabled by default — you must set enabled = true and provide a command in your rsconstruct.toml.

This processor supports batch mode, allowing multiple files to be checked in a single invocation for better performance.

Source Files

Input: configured via src_extensions and src_dirs
Output: none (checker)

Configuration

[processor.script]
enabled = true
command = "python"
args = ["scripts/md_lint.py", "-q"]
src_extensions = [".md"]
src_dirs = ["marp"]

Key	Type	Default	Description
`enabled`	bool	`false`	Must be set to `true` to activate
`command`	string	(required)	The command to run
`args`	string[]	`[]`	Extra arguments passed before file paths
`src_extensions`	string[]	`[]`	File extensions to scan for
`src_dirs`	string[]	`[""]`	Directory to scan (empty = project root)
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds
`dep_auto`	string[]	`[]`	Auto-detected input files

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Shellcheck Processor

Purpose

Lints shell scripts using shellcheck.

How It Works

Discovers .sh and .bash files in the project (excluding common build tool directories), runs shellcheck on each file, and records success in the cache. A non-zero exit code from shellcheck fails the product.

This processor supports batch mode, allowing multiple files to be checked in a single shellcheck invocation for better performance.

Source Files

Input: **/*.sh, **/*.bash
Output: none (linter)

Configuration

[processor.shellcheck]
command = "shellcheck"                       # The shellcheck command to run
args = []                                    # Additional arguments to pass to shellcheck
dep_inputs = []                            # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`command`	string	`"shellcheck"`	The shellcheck executable to run
`args`	string[]	`[]`	Extra arguments passed to shellcheck
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Slidev Processor

Purpose

Builds Slidev presentations.

How It Works

Discovers .md files in the project (excluding common build tool directories), runs slidev build on each file, and records success in the cache. A non-zero exit code from slidev fails the product.

This processor supports batch mode.

Source Files

Input: **/*.md
Output: none (checker)

Configuration

[processor.slidev]
args = []
dep_inputs = []

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to slidev build
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Zspell Processor

Purpose

Checks documentation files for spelling errors using Hunspell-compatible dictionaries (via the zspell crate, pure Rust).

How It Works

Discovers files matching the configured extensions, extracts words from markdown content (stripping code blocks, inline code, URLs, and HTML tags), and checks each word against the system Hunspell dictionary and a custom words file (if it exists). Fails with a list of misspelled words on error.

Dictionaries are read from /usr/share/hunspell/.

This processor supports batch mode when auto_add_words is enabled, collecting all misspelled words across files and writing them to the words file at the end.

Source Files

Input: **/*{src_extensions} (default: **/*.md)
Output: none (checker)

Custom Words File

The processor loads custom words from the file specified by words_file (default: .zspell-words) if the file exists. Format: one word per line, # comments supported, blank lines ignored.

The words file is also auto-detected as an input via dep_auto, so changes to it invalidate all zspell products. To disable words file detection, set dep_auto = [].

Configuration

[processor.zspell]
src_extensions = [".md"]                  # File extensions to check (default: [".md"])
language = "en_US"                    # Hunspell dictionary language (default: "en_US")
words_file = ".zspell-words"          # Path to custom words file (default: ".zspell-words")
auto_add_words = false                # Auto-add misspelled words to words_file (default: false)
dep_auto = [".zspell-words"]       # Auto-detected config files (default: [".zspell-words"])
dep_inputs = []                     # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`src_extensions`	string[]	`[".md"]`	File extensions to discover
`language`	string	`"en_US"`	Hunspell dictionary language (requires system package)
`words_file`	string	`".zspell-words"`	Path to custom words file (relative to project root)
`auto_add_words`	bool	`false`	Auto-add misspelled words to words_file instead of failing (also available as `--auto-add-words` CLI flag)
`dep_auto`	string[]	`[".zspell-words"]`	Config files auto-detected as inputs
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Sphinx Processor

Purpose

Builds Sphinx documentation projects.

How It Works

Discovers conf.py files indicating Sphinx projects, collects sibling .rst, .py, and .md files as inputs, and runs sphinx-build to generate output. A non-zero exit code fails the product.

Source Files

Input: **/conf.py (plus sibling .rst, .py, .md files)
Output: none (creator — produces output in _build directory)

Configuration

[processor.sphinx]
command = "sphinx-build"               # The sphinx-build command to run
output_dir = "_build"                  # Output directory for generated docs
args = []                              # Additional arguments to pass to sphinx-build
dep_inputs = []                      # Additional files that trigger rebuilds when changed
cache_output_dir = true                # Cache the output directory for fast restore after clean

Key	Type	Default	Description
`command`	string	`"sphinx-build"`	The sphinx-build executable to run
`output_dir`	string	`"_build"`	Output directory for generated documentation
`args`	string[]	`[]`	Extra arguments passed to sphinx-build
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds
`cache_output_dir`	boolean	`true`	Cache the `_build/` directory so `rsconstruct clean && rsconstruct build` restores from cache

Batch support

Runs as a single whole-project operation (e.g., cargo build, npm install).

Standard Processor

Purpose

Checks JavaScript code style using standard.

How It Works

Discovers .js files in the project (excluding common build tool directories), runs standard on each file, and records success in the cache. A non-zero exit code from standard fails the product.

This processor supports batch mode.

Source Files

Input: **/*.js
Output: none (checker)

Configuration

[processor.standard]
args = []
dep_inputs = []

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to standard
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Stylelint Processor

Purpose

Lints CSS, SCSS, Sass, and Less files using stylelint.

How It Works

Discovers .css, .scss, .sass, and .less files in the project (excluding common build tool directories), runs stylelint on each file, and records success in the cache. A non-zero exit code from stylelint fails the product.

This processor supports batch mode.

If a stylelint config file exists (.stylelintrc* or stylelint.config.*), it is automatically added as an extra input so that configuration changes trigger rebuilds.

Source Files

Input: **/*.css, **/*.scss, **/*.sass, **/*.less
Output: none (checker)

Configuration

[processor.stylelint]
command = "stylelint"
args = []
dep_inputs = []

Key	Type	Default	Description
`command`	string	`"stylelint"`	The stylelint executable to run
`args`	string[]	`[]`	Extra arguments passed to stylelint
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Tags Processor

Purpose

Extracts YAML frontmatter tags from markdown files into a searchable database with comprehensive validation.

How It Works

Scans .md files for YAML frontmatter blocks (delimited by ---), parses tag metadata, and builds a redb database. The database enables querying files by tags via rsconstruct tags subcommands.

Tag Indexing

Two kinds of frontmatter fields are indexed:

List fields — each item becomes a tag as-is.
```
tags:
  - tools:docker
  - tools:python
```
Produces tags: tools:docker, tools:python.
Scalar fields — indexed as key:value (colon separator).
```
level: beginner
category: big-data
duration_hours: 24
```
Produces tags: level:beginner, category:big-data, duration_hours:24.

Both inline YAML lists (tags: [a, b, c]) and multi-line lists are supported.

The `tags_dir` Allowlist

The tags_dir directory (default: tags/) contains .txt files that define the allowed tags. Each file <name>.txt contributes tags as <name>:<line> pairs. For example:

tags/
├── level.txt        # Contains: beginner, intermediate, advanced
├── languages.txt    # Contains: python, rust, go, ...
├── tools.txt        # Contains: docker, ansible, ...
└── audiences.txt    # Contains: developers, architects, ...

level.txt with content beginner produces the allowed tag level:beginner.

The tags processor is only auto-detected when tags_dir contains .txt files.

Build-Time Validation

During every build, the tags processor runs the following checks. Any failure stops the build with a descriptive error message.

Required Frontmatter Fields

When required_fields is configured, every .md file must contain those frontmatter fields. Empty lists ([]) and empty strings are treated as missing. Files with no frontmatter block at all also fail:

[processor.tags]
required_fields = ["tags", "level", "category", "duration_hours", "audiences"]

Missing required frontmatter fields:
  syllabi/courses/intro.md: category, duration_hours
  syllabi/courses/advanced.md: audiences

Required Field Groups

When required_field_groups is configured, every file must satisfy at least one group (all fields in that group present). This handles cases where files may have alternative sets of fields:

[processor.tags]
required_field_groups = [
    ["duration_hours"],
    ["duration_hours_long", "duration_hours_short"],
]

A file with duration_hours passes. A file with both duration_hours_long and duration_hours_short passes. A file with only duration_hours_short (partial group) or none of these fields fails:

Files missing required field groups (must satisfy at least one):
  syllabi/courses/intro.md: none of [duration_hours] or [duration_hours_long, duration_hours_short]

Required Values

When required_values is configured, scalar fields must contain a value that exists in the corresponding tags/<field>.txt file. This catches typos in scalar values:

[processor.tags]
required_values = ["level", "category"]

Invalid values for validated fields:
  syllabi/courses/intro.md: level=begginer (not in tags/level.txt)

Field Types

When field_types is configured, frontmatter fields must have the expected type. Supported types: "list", "scalar", "number".

[processor.tags.field_types]
tags = "list"
level = "scalar"
duration_hours = "number"

Field type mismatches:
  syllabi/courses/intro.md: 'level' expected list, got scalar

Unique Fields

When unique_fields is configured, no two files may share the same value for that field:

[processor.tags]
unique_fields = ["title"]

Duplicate values for unique fields:
  title='Intro to Docker' in:
    - syllabi/courses/docker_intro.md
    - syllabi/courses/containers/docker_intro.md

Sorted Tags

When sorted_tags = true, list-type frontmatter fields must have their items in lexicographic sorted order. This reduces diff noise in version control:

[processor.tags]
sorted_tags = true

List tags are not in sorted order:
  syllabi/courses/intro.md field 'tags': 'tools:alpha' should come after 'tools:beta'

Duplicate Tags Within a File

The same tag cannot appear twice in a single file’s frontmatter:

Duplicate tags found within files:
  tools:docker in syllabi/courses/containers/intro.md

Duplicate Tags Across Tag Lists

The same category:value tag cannot be defined in multiple tags_dir/*.txt files. Note that the same value in different categories is fine (tools:docker and infra:docker are distinct tags):

Duplicate tags found across tags files:
  tools:docker in tools.txt and infra.txt

Unknown Tags

Every tag found in frontmatter must exist in tags_dir. Unknown tags cause an error with a typo suggestion (Levenshtein distance):

Unknown tags found (not in tags):
  tools:dockker (did you mean 'tools:docker'?)
    - syllabi/courses/containers/intro.md

Unused Tags

Every tag defined in tags_dir/*.txt must be used by at least one .md file. This catches stale entries that should be cleaned up:

Unused tags in tags (not used by any file):
  tools:vagrant
  languages:fortran

Source Files

Input: **/*.md (configurable via src_dirs / src_extensions)
Output: out/tags/tags.db

Configuration

[processor.tags]
output = "out/tags/tags.db"                                       # Output database path
tags_dir = "tags"                                            # Directory containing tag list files
required_fields = ["tags", "level", "category"]                   # Fields every .md file must have
required_field_groups = [                                         # At least one group must be fully present
    ["duration_hours"],
    ["duration_hours_long", "duration_hours_short"],
]
required_values = ["level", "category"]                           # Scalar fields validated against tags
unique_fields = ["title"]                                         # Fields that must be unique across files
sorted_tags = true                                                # Require list items in sorted order
dep_inputs = []                                                 # Additional files that trigger rebuilds

[processor.tags.field_types]
tags = "list"                                                     # Must be a YAML list
level = "scalar"                                                  # Must be a string
duration_hours = "number"                                         # Must be numeric

Key	Type	Default	Description
`output`	string	`"out/tags/tags.db"`	Path to the tags database file
`tags_dir`	string	`"tags"`	Directory containing `.txt` tag list files
`required_fields`	string[]	`[]`	Frontmatter fields that every `.md` file must have
`required_field_groups`	string[][]	`[]`	Alternative field groups; at least one group must be fully present
`required_values`	string[]	`[]`	Scalar fields whose values must exist in `tags/<field>.txt`
`unique_fields`	string[]	`[]`	Fields whose values must be unique across all files
`field_types`	map	`{}`	Expected types per field: `"list"`, `"scalar"`, or `"number"`
`sorted_tags`	bool	`false`	Require list items in sorted order within each file
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

Subcommands

All subcommands require a prior rsconstruct build to populate the database (except check which reads files directly). All support --json for machine-readable output.

Querying

Command	Description
`rsconstruct tags list`	List all unique tags (sorted)
`rsconstruct tags files TAG [TAG...]`	List files matching all given tags (AND)
`rsconstruct tags files --or TAG [TAG...]`	List files matching any given tag (OR)
`rsconstruct tags grep TEXT`	Search for tags containing a substring
`rsconstruct tags grep -i TEXT`	Case-insensitive tag search
`rsconstruct tags for-file PATH`	List all tags for a specific file (supports suffix matching)
`rsconstruct tags frontmatter PATH`	Show raw parsed frontmatter for a file
`rsconstruct tags count`	Show each tag with its file count, sorted by frequency
`rsconstruct tags tree`	Show tags grouped by key (e.g. `level=` group) vs bare tags
`rsconstruct tags stats`	Show database statistics (file count, unique tags, associations)

Reporting

Command	Description
`rsconstruct tags matrix`	Show a coverage matrix of tag categories per file
`rsconstruct tags coverage`	Show percentage of files that have each tag category
`rsconstruct tags orphans`	Find files with no tags at all
`rsconstruct tags suggest PATH`	Suggest tags for a file based on similarity to other tagged files

Validation

Command	Description
`rsconstruct tags check`	Run all validations without building (fast lint pass)
`rsconstruct tags unused`	List tags in `tags_dir` that no file uses
`rsconstruct tags unused --strict`	Same, but exit with error if any unused tags exist (for CI)
`rsconstruct tags validate`	Validate indexed tags against `tags_dir` without rebuilding

Terms Processor

Purpose

Checks that technical terms from a terms directory are backtick-quoted in Markdown files, and provides commands to auto-fix and merge term lists across projects.

How It Works

Loads terms from terms/*.txt files (one term per line, organized by category). For each .md file, simulates what rsconstruct terms fix would produce. If the result differs from the current content, the product fails.

The processor skips YAML frontmatter and fenced code blocks. Terms are matched case-insensitively with word-boundary detection, longest-first to avoid partial matches (e.g., “Android Studio” matches before “Android”).

Auto-detected when a terms/ directory exists and .md files are present.

Source Files

Input: **/*.md
Output: none (checker)

Configuration

[processor.terms]
terms_dir = "terms"       # Directory containing term list .txt files
batch = true              # Enable batch execution
dep_inputs = []         # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`terms_dir`	string	`"terms"`	Directory containing `.txt` term list files
`batch`	bool	`true`	Enable batch execution
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Term List Format

Each .txt file in the terms directory contains one term per line. Files are typically organized by category:

terms/
  programming_languages.txt
  frameworks_and_libraries.txt
  databases_and_storage.txt
  devops_and_cicd.txt
  ...

Example programming_languages.txt:

Python
JavaScript
TypeScript
Rust
C++
Go

Commands

`rsconstruct terms fix`

Add backticks around unquoted terms in all markdown files.

rsconstruct terms fix
rsconstruct terms fix --remove-non-terms   # also remove backticks from non-terms

The fix is idempotent: running it twice produces the same result.

`rsconstruct terms merge <path>`

Merge terms from another project’s terms directory into the current one. For matching filenames, new terms are added (union). Missing files are copied in both directions.

rsconstruct terms merge ../other-project/terms

Taplo Processor

Purpose

Checks TOML files using taplo.

How It Works

Discovers .toml files in the project (excluding common build tool directories), runs taplo check on each file, and records success in the cache. A non-zero exit code from taplo fails the product.

This processor supports batch mode, allowing multiple files to be checked in a single taplo invocation for better performance.

Source Files

Input: **/*.toml
Output: none (checker)

Configuration

[processor.taplo]
command = "taplo"                             # The taplo command to run
args = []                                    # Additional arguments to pass to taplo
dep_inputs = []                            # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`command`	string	`"taplo"`	The taplo executable to run
`args`	string[]	`[]`	Extra arguments passed to taplo
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Tera Processor

Purpose

Renders Tera template files into output files, with support for loading configuration variables from Python or Lua files.

How It Works

Files matching configured extensions in tera.templates/ are rendered and written to the project root with the extension stripped:

tera.templates/app.config.tera  →  app.config
tera.templates/sub/readme.txt.tera  →  sub/readme.txt

Templates use the Tera templating engine and can call load_python(path="...") or load_lua(path="...") to load variables from config files.

Loading Lua config

{% set config = load_lua(path="config/settings.lua") %}
[app]
name = "{{ config.project_name }}"
version = "{{ config.version }}"

Lua configs are executed via the embedded Lua 5.4 interpreter (no external dependency). All user-defined globals (strings, numbers, booleans, tables) are exported. Built-in Lua globals and functions are automatically filtered out. dofile() and require() work relative to the config file’s directory.

Loading Python config

{% set config = load_python(path="config/settings.py") %}
[app]
name = "{{ config.project_name }}"
version = "{{ config.version }}"

Source Files

Input: tera.templates/**/*{src_extensions}
Output: project root, mirroring the template path with the extension removed

Configuration

[processor.tera]
src_extensions = [".tera"]                     # File extensions to process (default: [".tera"])
dep_inputs = ["config/settings.py"]      # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`src_extensions`	string[]	`[".tera"]`	File extensions to discover
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

Each input file is processed individually, producing its own output file.

Tidy Processor

Purpose

Validates HTML files using HTML Tidy.

How It Works

Discovers .html and .htm files in the project (excluding common build tool directories), runs tidy -errors on each file, and records success in the cache. A non-zero exit code from tidy fails the product.

This processor supports batch mode.

Source Files

Input: **/*.html, **/*.htm
Output: none (checker)

Configuration

[processor.tidy]
args = []
dep_inputs = []

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to tidy
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

XMLLint Processor

Purpose

Validates XML files using xmllint.

How It Works

Discovers .xml files in the project (excluding common build tool directories), runs xmllint --noout on each file, and records success in the cache. A non-zero exit code from xmllint fails the product.

This processor supports batch mode.

Source Files

Input: **/*.xml
Output: none (checker)

Configuration

[processor.xmllint]
args = []
dep_inputs = []

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to xmllint
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Yaml2json Processor

Purpose

Converts YAML files to JSON. Native (in-process, no external tools required).

How It Works

Discovers YAML files in the configured directories and converts each to a pretty-printed JSON file.

Source Files

Input: **/*.yml, **/*.yaml
Output: out/yaml2json/{relative_path}.json

Configuration

[processor.yaml2json]
src_dirs = ["yaml"]
output_dir = "out/yaml2json"    # Output directory (default)

Key	Type	Default	Description
`output_dir`	string	`"out/yaml2json"`	Output directory for JSON files
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch Support

Each input file is processed individually, producing its own output file.

Yamllint Processor

Purpose

Lints YAML files using yamllint.

How It Works

Discovers .yml and .yaml files in the project (excluding common build tool directories), runs yamllint on each file, and records success in the cache. A non-zero exit code from yamllint fails the product.

This processor supports batch mode, allowing multiple files to be checked in a single yamllint invocation for better performance.

Source Files

Input: **/*.yml, **/*.yaml
Output: none (checker)

Configuration

[processor.yamllint]
command = "yamllint"                          # The yamllint command to run
args = []                                    # Additional arguments to pass to yamllint
dep_inputs = []                            # Additional files that trigger rebuilds when changed

Key	Type	Default	Description
`command`	string	`"yamllint"`	The yamllint executable to run
`args`	string[]	`[]`	Extra arguments passed to yamllint
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

Yq Processor

Purpose

Validates YAML files using yq.

How It Works

Discovers .yml and .yaml files in the project (excluding common build tool directories), runs yq . on each file to validate syntax, and records success in the cache. A non-zero exit code from yq fails the product.

This processor supports batch mode.

Source Files

Input: **/*.yml, **/*.yaml
Output: none (checker)

Configuration

[processor.yq]
args = []
dep_inputs = []

Key	Type	Default	Description
`args`	string[]	`[]`	Extra arguments passed to yq
`dep_inputs`	string[]	`[]`	Extra files whose changes trigger rebuilds

Batch support

The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.

GitHub Actions

How to run rsconstruct in a GitHub Actions workflow.

Recommended flags

- name: Build
  run: rsconstruct build -q -j0

Flag	Why
`-q` (quiet)	Suppresses the progress bar and status messages. The progress bar uses terminal escape codes that produce garbage in CI logs. Only errors are shown.
`-j0`	Auto-detect CPU cores. GitHub-hosted runners have 4 cores (`ubuntu-latest`) — using them all speeds up the build significantly vs the default of `-j1`.

Full workflow example

name: Build
on: [push, pull_request]

jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install rsconstruct
        run: cargo install rsconstruct

      - name: Install tools
        run: rsconstruct tools install --yes

      - name: Build
        run: rsconstruct build -q -j0

Runner sizing

Runner	Cores	RAM	Notes
`ubuntu-latest`	4	16 GB	Good for most projects. Use `-j0` or `-j4`.
`ubuntu-latest` (private repo)	4	16 GB	Same hardware as public repos.
Large runners	8-64	32-256 GB	For large projects. `-j0` scales automatically.

-j0 always does the right thing — it detects the available cores at runtime. There is no benefit to setting -j higher than the core count.

Caching

Cache the .rsconstruct/ directory between runs to skip unchanged products:

      - uses: actions/cache@v4
        with:
          path: .rsconstruct
          key: rsconstruct-${{ hashFiles('rsconstruct.toml') }}-${{ github.sha }}
          restore-keys: |
            rsconstruct-${{ hashFiles('rsconstruct.toml') }}-
            rsconstruct-

This restores cached build products from previous runs. Only products whose inputs changed will be rebuilt.

Tips

Don’t use --timings in CI unless you need the data. It adds overhead.
Use --json instead of -q if you want machine-readable output for downstream processing.
Use -k (keep-going) to see all failures at once instead of stopping at the first one.
Use --verify-tool-versions to catch tool version drift between local and CI environments.

Lua Plugins

RSConstruct supports custom processors written in Lua. Drop a .lua file in the plugins/ directory and add a [processor.NAME] section in rsconstruct.toml. The plugin participates in discovery, execution, caching, cleaning, tool listing, and auto-detection just like a built-in processor.

Quick Start

1. Create the plugin file:

plugins/eslint.lua

function description()
    return "Lint JavaScript/TypeScript with ESLint"
end

function required_tools()
    return {"eslint"}
end

function discover(project_root, config, files)
    local products = {}
    for _, file in ipairs(files) do
        local stub = rsconstruct.stub_path(project_root, file, "eslint")
        table.insert(products, {
            inputs = {file},
            outputs = {stub},
        })
    end
    return products
end

function execute(product)
    rsconstruct.run_command("eslint", {product.inputs[1]})
    rsconstruct.write_stub(product.outputs[1], "linted")
end

2. Enable it in rsconstruct.toml:

[processor.eslint]
src_dirs = ["src"]
src_extensions = [".js", ".ts"]

3. Run it:

rsconstruct build            # builds including the plugin
rsconstruct processors list   # shows the plugin
rsconstruct processors files  # shows files discovered by the plugin

Lua API Contract

Each .lua file defines global functions. Three are required; the rest have sensible defaults.

Required Functions

`description()`

Returns a human-readable string describing what the processor does. Called once when the plugin is loaded.

function description()
    return "Lint JavaScript files with ESLint"
end

`discover(project_root, config, files)`

Called during product discovery. Receives:

project_root (string) — absolute path to the project root
config (table) — the [processor.NAME] TOML section as a Lua table
files (table) — list of absolute file paths matching the scan configuration

Must return a table of products. Each product is a table with inputs and outputs keys, both containing tables of absolute file paths.

function discover(project_root, config, files)
    local products = {}
    for _, file in ipairs(files) do
        local stub = rsconstruct.stub_path(project_root, file, "myplugin")
        table.insert(products, {
            inputs = {file},
            outputs = {stub},
        })
    end
    return products
end

`execute(product)`

Called to build a single product. Receives a table with inputs and outputs keys (both tables of absolute path strings). Must create the output files on success or error on failure.

function execute(product)
    rsconstruct.run_command("mytool", {product.inputs[1]})
    rsconstruct.write_stub(product.outputs[1], "done")
end

Optional Functions

`clean(product)`

Called when running rsconstruct clean. Receives the same product table as execute(). Default behavior: removes all output files.

function clean(product)
    for _, output in ipairs(product.outputs) do
        rsconstruct.remove_file(output)
    end
end

`auto_detect(files)`

Called to determine whether this processor is relevant for the project (when auto_detect = true in config). Receives the list of matching files. Default: returns true if the files list is non-empty.

function auto_detect(files)
    return #files > 0
end

`required_tools()`

Returns a table of external tool names required by this processor. Used by rsconstruct tools list and rsconstruct tools check. Default: empty table.

function required_tools()
    return {"eslint", "node"}
end

`processor_type()`

Returns the type of processor: "generator" or "checker". Generators create real output files (e.g., compilers, transpilers). Checkers validate input files; for checkers, you can choose whether to produce stub files or not. Default: "checker".

Option 1: Checker with stub files (for Lua plugins)

function processor_type()
    return "checker"
end

When using stub files, return outputs = {stub} from discover() and call rsconstruct.write_stub() in execute().

Option 2: Checker without stub files

function processor_type()
    return "checker"
end

Return outputs = {} from discover() and don’t write stubs in execute(). The cache database entry itself serves as the success record.

The `rsconstruct` Global Table

Lua plugins have access to an rsconstruct global table with helper functions.

Function	Description
`rsconstruct.stub_path(project_root, source, suffix)`	Compute the stub output path for a source file. Maps `project_root/a/b/file.ext` to `out/suffix/a_b_file.ext.suffix`.
`rsconstruct.run_command(program, args)`	Run an external command. Errors if the command fails (non-zero exit).
`rsconstruct.run_command_cwd(program, args, cwd)`	Run an external command with a working directory.
`rsconstruct.write_stub(path, content)`	Write a stub file (creates parent directories as needed).
`rsconstruct.remove_file(path)`	Remove a file if it exists. No error if the file is missing.
`rsconstruct.file_exists(path)`	Returns `true` if the file exists.
`rsconstruct.read_file(path)`	Read a file and return its contents as a string.
`rsconstruct.path_join(parts)`	Join path components. Takes a table: `rsconstruct.path_join({"a", "b", "c"})` returns `"a/b/c"`.
`rsconstruct.log(message)`	Print a message prefixed with the plugin name.

Configuration

Plugins use the standard scan configuration fields. Any [processor.NAME] section in rsconstruct.toml is passed to the plugin’s discover() function as the config table.

Scan Configuration

These fields control which files are passed to discover():

Key	Type	Default	Description
`src_dirs`	string[]	`[""]`	Directory to scan (`""` = project root)
`src_extensions`	string[]	`[]`	File extensions to match
`src_exclude_dirs`	string[]	`[]`	Directory path segments to skip
`src_exclude_files`	string[]	`[]`	File names to skip
`src_exclude_paths`	string[]	`[]`	Paths relative to project root to skip

Custom Configuration

Any additional keys in the [processor.NAME] section are passed through to the Lua config table:

[processor.eslint]
src_dirs = ["src"]
src_extensions = [".js", ".ts"]
max_warnings = 0          # custom key, accessible as config.max_warnings in Lua
fix = false               # custom key, accessible as config.fix in Lua

function execute(product)
    local args = {product.inputs[1]}
    if config.max_warnings then
        table.insert(args, "--max-warnings")
        table.insert(args, tostring(config.max_warnings))
    end
    rsconstruct.run_command("eslint", args)
    rsconstruct.write_stub(product.outputs[1], "linted")
end

Plugins Directory

The directory where RSConstruct looks for .lua files is configurable:

[plugins]
dir = "plugins"  # default

Plugin Name Resolution

The plugin name is derived from the .lua filename (without extension). This name is used for:

The [processor.NAME] config section
The [processor.NAME] config section in rsconstruct.toml
The out/NAME/ stub directory
Display in rsconstruct processors list and build output

A plugin name must not conflict with a built-in processor name (tera, ruff, pylint, cc_single_file, cppcheck, shellcheck, zspell, make). RSConstruct will error if a conflict is detected.

Incremental Builds

Lua plugins participate in RSConstruct’s incremental build system automatically:

Products are identified by their inputs, outputs, and a config hash
If none of the declared inputs have changed since the last build, the product is skipped
If the [processor.NAME] config section changes, all products are rebuilt
Outputs are cached and can be restored from cache

For correct incrementality, make sure discover() declares all files that affect the output. If your tool reads additional configuration files, include them in the inputs list.

Examples

Linter Without Stub Files (Recommended)

A checker that validates files without producing stub files. Success is recorded in the cache database.

function description()
    return "Lint YAML files with yamllint"
end

function processor_type()
    return "checker"
end

function required_tools()
    return {"yamllint"}
end

function discover(project_root, config, files)
    local products = {}
    for _, file in ipairs(files) do
        table.insert(products, {
            inputs = {file},
            outputs = {},  -- No output files
        })
    end
    return products
end

function execute(product)
    rsconstruct.run_command("yamllint", {"-s", product.inputs[1]})
    -- No stub to write; cache entry = success
end

function clean(product)
    -- Nothing to clean
end

[processor.yamllint]
src_extensions = [".yml", ".yaml"]

Stub-Based Linter (Legacy)

A linter that creates stub files. Use this if you need the stub file for some reason.

function description()
    return "Lint YAML files with yamllint"
end

function processor_type()
    return "checker"
end

function required_tools()
    return {"yamllint"}
end

function discover(project_root, config, files)
    local products = {}
    for _, file in ipairs(files) do
        table.insert(products, {
            inputs = {file},
            outputs = {rsconstruct.stub_path(project_root, file, "yamllint")},
        })
    end
    return products
end

function execute(product)
    rsconstruct.run_command("yamllint", {"-s", product.inputs[1]})
    rsconstruct.write_stub(product.outputs[1], "linted")
end

[processor.yamllint]
src_extensions = [".yml", ".yaml"]

File Transformer (Generator)

A plugin that transforms input files into output files (not stubs). This is a “generator” processor.

function description()
    return "Compile Sass to CSS"
end

function processor_type()
    return "generator"
end

function required_tools()
    return {"sass"}
end

function discover(project_root, config, files)
    local products = {}
    for _, file in ipairs(files) do
        local out = file:gsub("%.scss$", ".css"):gsub("^" .. project_root .. "/src/", project_root .. "/out/sass/")
        table.insert(products, {
            inputs = {file},
            outputs = {out},
        })
    end
    return products
end

function execute(product)
    rsconstruct.run_command("sass", {product.inputs[1], product.outputs[1]})
end

[processor.sass]
src_dirs = ["src"]
src_extensions = [".scss"]

Advanced Usage

Parallel builds

RSConstruct can build independent products concurrently. Set the number of parallel jobs:

rsconstruct build -j4       # 4 parallel jobs
rsconstruct build -j0       # Auto-detect CPU cores

Or configure it in rsconstruct.toml:

[build]
parallel = 4   # 0 = auto-detect

The -j flag on the command line overrides the config file setting.

Watch mode

Watch source files and automatically rebuild on changes:

rsconstruct watch

This monitors all source files and triggers an incremental build whenever a file is modified.

Dependency graph

Visualize the build dependency graph in multiple formats:

rsconstruct graph                    # Default text format
rsconstruct graph --format dot       # Graphviz DOT format
rsconstruct graph --format mermaid   # Mermaid diagram format
rsconstruct graph --format json      # JSON format
rsconstruct graph --view             # Open in browser or viewer

The --view flag opens the graph using the configured viewer (set in rsconstruct.toml):

[graph]
viewer = "google-chrome"

Ignoring files

RSConstruct respects .gitignore files automatically. Any file ignored by git is also ignored by all processors. Nested .gitignore files and negation patterns are supported.

For project-specific exclusions that should not go in .gitignore, create a .rsconstructignore file in the project root with glob patterns (one per line):

/src/experiments/**
*.bak

The syntax is the same as .gitignore: # for comments, / prefix to anchor to the project root, / suffix for directories, and */** for globs.

Processor verbosity levels

Control the detail level of build output with -v N:

Level	Output
0 (default)	Target basename only: `main.elf`
1	Target path: `out/cc_single_file/main.elf`; cc_single_file processor also prints compiler commands
2	Adds source path: `out/cc_single_file/main.elf <- src/main.c`
3	Adds all inputs: `out/cc_single_file/main.elf <- src/main.c, src/utils.h`

Dry run

Preview what would be built without executing anything:

rsconstruct build --dry-run

Keep going after errors

By default, RSConstruct stops on the first error. Use --keep-going to continue building other products:

rsconstruct build --keep-going

Build timings

Show per-product and total timing information:

rsconstruct build --timings

Shell completions

Generate shell completions for your shell:

rsconstruct complete bash    # Bash completions
rsconstruct complete zsh     # Zsh completions
rsconstruct complete fish    # Fish completions

Configure which shells to generate completions for:

[completions]
shells = ["bash"]

Extra inputs

By default, each processor only tracks its primary source files as inputs. If a product depends on additional files that aren’t automatically discovered (e.g., a config file read by a linter, a suppressions file used by a static analyzer, or a Python settings file loaded by a template), you can declare them with dep_inputs.

When any file listed in dep_inputs changes, all products from that processor are rebuilt.

[processor.template]
dep_inputs = ["config/settings.py", "config/database.py"]

[processor.ruff]
dep_inputs = ["pyproject.toml"]

[processor.pylint]
dep_inputs = ["pyproject.toml"]

[processor.cppcheck]
dep_inputs = [".cppcheck-suppressions"]

[processor.cc_single_file]
dep_inputs = ["Makefile.inc"]

[processor.zspell]
dep_inputs = ["custom-dictionary.txt"]

Paths are relative to the project root. Missing files cause a build error, so all listed files must exist.

The dep_inputs paths are included in the processor’s config hash, so adding or removing entries triggers a rebuild even if the files themselves haven’t changed. The file contents are also checksummed as part of the product’s input set, so any content change is detected by the incremental build system.

All processors support dep_inputs.

Graceful interrupt

Pressing Ctrl+C during a build stops execution promptly:

Subprocess termination — All external processes (compilers, linters, etc.) are spawned with a poll loop that checks for interrupts every 50ms. When Ctrl+C is detected, the running child process is killed immediately rather than waiting for it to finish. This keeps response time under 50ms regardless of how long the subprocess would otherwise run.
Progress preservation — Products that completed successfully before the interrupt are cached. The next build resumes from where it left off rather than starting over.
Parallel builds — In parallel mode, all in-flight subprocesses are killed when Ctrl+C is detected. Each thread’s poll loop independently checks the global interrupt flag.

Environment Variables

The problem

Build tools that inherit the user’s environment variables produce non-deterministic builds. Consider a C compiler invoked by a build tool:

If the user has CFLAGS=-O2 in their shell, the build produces optimized output.
If they unset it, the build produces debug output.
Two developers on the same project get different results from the same source files.

This breaks caching (the cache key doesn’t account for env vars), breaks reproducibility (builds differ across machines), and makes debugging harder (a build failure may depend on an env var the developer forgot they set).

Common examples of environment variables that silently affect build output:

Variable	Effect
`CC`, `CXX`	Changes which compiler is used
`CFLAGS`, `CXXFLAGS`, `LDFLAGS`	Changes compiler/linker flags
`PATH`	Changes which tool versions are found
`PYTHONPATH`	Changes Python module resolution
`LANG`, `LC_ALL`	Changes locale-dependent output (sorting, error messages)
`HOME`	Changes where config files are read from

RSConstruct’s approach

RSConstruct does not use environment variables from the user’s environment to control build behavior. All configuration comes from explicit, versioned sources:

rsconstruct.toml — all processor configuration (compiler flags, linter args, scan dirs, etc.)
Source file directives — per-file flags embedded in comments (e.g., // EXTRA_COMPILE_FLAGS_BEFORE=-pthread)
Tool lock file — .tools.versions locks tool versions so changes are detected

This means:

The same source tree always produces the same build, regardless of the user’s shell environment.
Cache keys are computed from file contents and config values, not ambient env vars.
Remote cache sharing works because two machines with different environments still produce identical cache keys for identical inputs.

Rules for processor authors

When implementing a processor (built-in or Lua plugin):

Never read std::env::var() to determine build behavior. If a value is configurable, add it to the processor’s config struct in rsconstruct.toml.
Never call cmd.env() to pass environment variables to external tools, unless the variable is derived from explicit config (not from std::env). The user’s environment is inherited by default — the goal is to avoid adding env-based configuration on top.
Tool paths come from PATH — RSConstruct does inherit the user’s PATH to find tools like gcc, ruff, etc. This is acceptable because the tool lock file (.tools.versions) detects when tool versions change and triggers rebuilds. Use rsconstruct tools lock to pin versions.
Config values, not env vars — if a tool needs a flag that varies per project, put it in rsconstruct.toml under the processor’s config section. Config values are hashed into cache keys automatically.

What RSConstruct does inherit

RSConstruct inherits the full parent environment for subprocess execution. This is unavoidable — tools need PATH to be found, HOME to read their own config files, etc. The key design decision is that RSConstruct itself never reads env vars to make build decisions, and processors never add env vars derived from the user’s environment.

The exceptions are:

NO_COLOR — RSConstruct respects this standard env var to disable colored output, which is a display concern and does not affect build output.
RSCONSTRUCT_THREADS — Sets the number of parallel jobs (equivalent to -j). Priority: CLI -j flag > RSCONSTRUCT_THREADS env var > [build] parallel config. This is a performance tuning concern and does not affect build correctness or output.

Internal Documentation

This section collects documentation aimed at rsconstruct’s contributors and maintainers — people who modify the codebase itself, not end users who configure rsconstruct for their projects.

If you are using rsconstruct to build a project, you can stop reading now. Everything below is about how rsconstruct works internally: data structures, design decisions, invariants, coding style, and the reasoning behind non-obvious choices.

What belongs here

A chapter belongs in “For Maintainers” if it answers at least one of these questions:

How is rsconstruct implemented? (Architecture, cache layout, execution model)
Why did we make this design choice? (Design notes, rejected alternatives, tradeoffs)
What contract must my code uphold? (Processor contract, invariants, coding standards)
What’s the right way to extend rsconstruct? (Adding processors, adding analyzers)
What’s the non-obvious implementation detail I need to know? (Checksum cache layers, descriptor keys, shared-output-directory semantics)

A chapter does NOT belong here if it answers:

How do I install rsconstruct?
How do I configure a processor for my project?
How do I use processor X on file type Y?

Those are user-facing and live in the main section above.

How to use this section

Read in roughly this order if you’re new to the codebase:

Architecture — 10-minute tour of the major modules and their responsibilities.
Coding Standards — conventions you’ll be held to in code review.
Strictness — how the compiler is configured to reject lax code, and the rules for opting out.
Processor Contract — the interface every processor must satisfy. Read before adding a new processor.
Testing — how the test suite is structured and how to add new tests.
Cache System and Checksum Cache — how incremental builds actually work.

After that, read topic-specific chapters as the work demands:

Building cache features → Cache System, Processor Versioning
Adding a processor that writes into a shared directory → Shared Output Directory
Adding cross-processor dependencies → Cross-Processor Dependencies
Thinking about ordering and enumeration → Processor Ordering, Output Prediction

Links to individual chapters

See the table of contents in the sidebar. Brief one-line summaries:

Architecture — module map and major data flows.
Design Notes — collected rationale for design decisions.
Coding Standards — naming, file layout, error handling conventions.
Strictness — crate-level #![deny(warnings)], rules for #[allow].
Testing — integration test structure and philosophy.
Parameter Naming — canonical names for the same concept in different places.
Processor Contract — what every processor must implement and uphold.
Cache System — content-addressed object store, descriptor keys.
Checksum Cache — mtime-based content hash caching.
Dependency Caching — caching of source-file dependency scans (e.g. C/C++ headers).
Processor Versioning — how processors invalidate caches when their behavior changes.
Cross-Processor Dependencies — how one processor’s outputs become another’s inputs.
Shared Output Directory — handling multiple processors that write into the same folder.
Processor Ordering — why rsconstruct does NOT have explicit ordering primitives.
Output Prediction — the MassGenerator design: tools that enumerate their outputs in advance.
Per-Processor Statistics — why cache stats can’t group by processor today, options for fixing it.
Profiling — recorded profiling runs with date + rsconstruct version, plus how-to for rerunning.
Unreferenced Files — detecting files on disk that no product references.
Internal Processors — pure-Rust processors that do not shell out.
Missing Processors — tools we don’t yet wrap but should.
Crates.io Publishing — release process.
Per-Processor max_jobs — design note for per-processor parallelism limits.
Rejected Audit Findings — audit issues deliberately rejected, kept to prevent re-flagging.
Suggestions — ideas for future work.
Suggestions Done — archive of completed suggestions.
TODO — ongoing and completed task list.

Architecture

This page describes RSConstruct’s internal design for contributors and those interested in how the tool works.

Core concepts

Processors

Processors implement the ProductDiscovery trait. Each processor:

Auto-detects whether it is relevant for the current project
Scans the project for source files matching its conventions
Creates products describing what to build
Executes the build for each product

Run rsconstruct processors list to see all available processors and their auto-detection results.

Auto-detection

Every processor implements auto_detect(), which returns true if the processor appears relevant for the current project based on filesystem heuristics. This allows RSConstruct to guess which processors a project needs without requiring manual configuration.

The ProductDiscovery trait requires four methods:

Method	Purpose
`auto_detect(file_index)`	Return `true` if the project looks like it needs this processor
`discover(graph, file_index)`	Query the file index and add products to the build graph
`execute(product)`	Build a single product
`clean(product)`	Remove a product’s outputs

Both auto_detect and discover receive a &FileIndex — a pre-built index of all non-ignored files in the project (see File indexing below).

Detection heuristics per processor:

Processor	Type	Detected when
`tera`	Generator	`templates/` directory contains files matching configured extensions
`ruff`	Checker	Project contains `.py` files
`pylint`	Checker	Project contains `.py` files
`mypy`	Checker	Project contains `.py` files
`pyrefly`	Checker	Project contains `.py` files
`cc_single_file`	Generator	Configured source directory contains `.c` or `.cc` files
`cppcheck`	Checker	Configured source directory contains `.c` or `.cc` files
`clang_tidy`	Checker	Configured source directory contains `.c` or `.cc` files
`shellcheck`	Checker	Project contains `.sh` or `.bash` files
`zspell`	Checker	Project contains files matching configured extensions (e.g., `.md`)
`aspell`	Checker	Project contains `.md` files
`ascii`	Checker	Project contains `.md` files
`rumdl`	Checker	Project contains `.md` files
`mdl`	Checker	Project contains `.md` files
`markdownlint`	Checker	Project contains `.md` files
`make`	Checker	Project contains `Makefile` files
`cargo`	Mass Generator	Project contains `Cargo.toml` files
`sphinx`	Mass Generator	Project contains `conf.py` files
`mdbook`	Mass Generator	Project contains `book.toml` files
`yamllint`	Checker	Project contains `.yml` or `.yaml` files
`jq`	Checker	Project contains `.json` files
`jsonlint`	Checker	Project contains `.json` files
`json_schema`	Checker	Project contains `.json` files
`taplo`	Checker	Project contains `.toml` files
`pip`	Mass Generator	Project contains `requirements.txt` files
`npm`	Mass Generator	Project contains `package.json` files
`gem`	Mass Generator	Project contains `Gemfile` files
`pandoc`	Generator	Project contains `.md` files
`markdown2html`	Generator	Project contains `.md` files
`marp`	Generator	Project contains `.md` files
`mermaid`	Generator	Project contains `.mmd` files
`drawio`	Generator	Project contains `.drawio` files
`a2x`	Generator	Project contains `.txt` (AsciiDoc) files
`pdflatex`	Generator	Project contains `.tex` files
`libreoffice`	Generator	Project contains `.odp` files
`pdfunite`	Generator	Source directory contains subdirectories with PDF-source files
`iyamlschema`	Checker	Project contains `.yml` or `.yaml` files
`yaml2json`	Generator	Project contains `.yml` or `.yaml` files
`imarkdown2html`	Generator	Project contains `.md` files
`tags`	Generator	Project contains `.md` files with YAML frontmatter

Run rsconstruct processors list to see the auto-detection results for the current project.

Products

A product represents a single build unit with:

Inputs — source files that the product depends on
Outputs — files that the product generates
Output directory (optional) — for creators, the directory whose entire contents are cached and restored as a unit

BuildGraph

The BuildGraph manages dependencies between products. It performs a topological sort to determine the correct build order, ensuring that dependencies are built before the products that depend on them.

Executor

The executor runs products in dependency order. It supports:

Sequential execution (default)
Parallel execution of independent products (with -j flag)
Dry-run mode (show what would be built)
Keep-going mode (continue after errors)
Batch execution (group multiple products into one tool invocation)

Incremental rebuild after partial failure

Each product is cached independently after successful execution. If a build is interrupted or fails partway through, the next run only rebuilds products that don’t have valid cache entries:

Non-batch mode (default fail-fast, chunk_size=1): Each product executes and is cached individually. If the build stops after 400 of 800 products, the next run skips the 400 cached successes and rebuilds the remaining 400.
Batch mode with external tools (--keep-going or explicit --batch-size): The external tool receives all files in the batch in one invocation. If the tool exits with an error, all products in that batch are marked failed — there is no way to determine which outputs are valid from a single exit code. On the next run, all products from the failed batch are rebuilt.
Batch mode with internal processors (e.g., imarkdown2html, isass, ipdfunite): These process files sequentially in-process and return per-file results, so partial failure is handled correctly even in batch mode — only the failed products are rebuilt.

Interrupt handling

All external subprocess execution goes through run_command() in src/processors/mod.rs. Instead of calling Command::output() (which blocks until the process finishes), run_command() uses Command::spawn() followed by a poll loop:

Spawn the child process with piped stdout/stderr
Every 50ms, call try_wait() to check if the process has exited
Between polls, check the global INTERRUPTED flag (set by the Ctrl+C handler)
If interrupted, kill the child process immediately and return an error

This ensures that pressing Ctrl+C terminates running subprocesses within 50ms, even for long-running compilations or linter invocations.

The global INTERRUPTED flag is an AtomicBool set once by the ctrlc handler in main.rs and checked by all threads.

File indexing

RSConstruct walks the project tree once at startup and builds a FileIndex — a sorted list of all non-ignored files. The walk is performed by the ignore crate (ignore::WalkBuilder), which natively handles:

.gitignore — standard git ignore rules, including nested .gitignore files and negation patterns
.rsconstructignore — project-specific ignore patterns using the same glob syntax as .gitignore

Processors never walk the filesystem themselves. Instead, auto_detect and discover receive a &FileIndex and query it with their scan configuration (src_extensions, exclude directories, exclude files). This replaces the previous design where each processor performed its own recursive walk.

Build pipeline

This is the core algorithm — every rsconstruct build follows these phases in order. Use --phases to see timing for each phase.

Phase 1: File indexing

The project tree is walked once to build the FileIndex — a sorted list of all non-ignored files. This is the only filesystem walk; all subsequent file lookups go through the index. See File indexing below.

Phase 2: Discovery (fixed-point loop)

Each enabled processor queries the file index and adds products to the BuildGraph. Discovery runs in a fixed-point loop to handle cross-processor dependencies:

file_index = walk filesystem
loop (max 10 passes):
    for each processor:
        processor.discover(graph, file_index)
    if no new products were added → break
    collect outputs from new products
    inject them as virtual files into file_index

On each pass, processors may re-declare existing products (silently deduplicated) or discover new products whose inputs are virtual files from upstream generators. The loop converges when a full pass adds nothing new. Most projects converge in 1 pass; projects with generator → checker/generator chains converge in 2.

See Cross-Processor Dependencies for details on deduplication and the virtual file mechanism.

Phase 3: Dependency analysis

Dependency analyzers (e.g., the C/C++ header scanner) run against the graph to add additional input edges. For example, if main.c includes util.h, the analyzer adds util.h as an input to the main.c product. Results are cached in deps.redb for incremental builds.

Phase 4: Tool version hashing

For each processor with a tool lock entry (rsconstruct tools lock), the locked tool version hash is appended to the product’s config hash. This ensures that upgrading a tool (e.g., ruff 0.4 → 0.5) triggers rebuilds even if source files haven’t changed.

Phase 5: Dependency resolution

resolve_dependencies() scans the graph for products whose inputs match other products’ outputs. When found, it creates a dependency edge — the producer must complete before the consumer can start. This is how cross-processor ordering works automatically (e.g., pandoc runs before the explicit site generator because pandoc’s HTML outputs are the site generator’s inputs).

After resolution, the graph is topologically sorted to produce the execution order.

Phase 6: Classify

Each product is classified as one of:

Skip (up-to-date) — input checksum matches the cache entry and all outputs exist on disk. No work needed.
Restore — input checksum matches a cache entry but outputs are missing (e.g., after rsconstruct clean). Outputs are restored from cache via hardlink or copy.
Build (stale) — input checksum doesn’t match any cache entry. The product must be rebuilt.

Input checksums are computed by hashing all input files (SHA-256). The mtime pre-check (mtime_check = true, default) skips rehashing files whose mtime hasn’t changed since the last build.

Phase 7: Execute

Products are executed in topological order, respecting dependency edges. Independent products at the same dependency level run in parallel (controlled by -j / RSCONSTRUCT_THREADS). Batch-capable processors group their products into a single tool invocation.

Batch chunk sizing: In fail-fast mode (default), batch chunk size is 1 — each product executes independently even for batch-capable processors. With --keep-going, all products are sent in one chunk. With --batch-size N, chunks are limited to N products. This means fail-fast mode gives the best incremental recovery after partial failure.

For each product:

Compute input checksum (if not already done in classify)
Check cache — skip or restore if possible
Execute the processor’s command
On success: store outputs in the cache (content-addressed under .rsconstruct/objects/)
On failure: report error (or continue if --keep-going)

Processor source layout

All processor code lives under src/processors/. The folder structure mirrors processor type:

src/processors/
├── mod.rs          # Processor trait, shared helpers (run_command, run_checker,
│                   # SimpleChecker, SimpleGenerator, ProcessorBase, …)
├── checkers/       # One file per checker (ruff.rs, pylint.rs, cppcheck.rs, …)
│   └── mod.rs      # Re-exports
├── generators/     # One file per generator (generator.rs, marp.rs, sass.rs, …)
│   ├── mod.rs      # Shared helpers: find_templates, output_path, discover_single_format, …
│   └── tags/       # Tags generator (multi-file, has its own subfolder)
├── creators/       # One file per creator (cargo.rs, npm.rs, gem.rs, pip.rs, …)
│   ├── mod.rs      # Re-exports
│   └── creator.rs  # Generic creator processor
├── explicit/       # Explicit processor (user-defined command with declared outputs)
│   ├── mod.rs
│   └── explicit.rs
└── lua/            # Lua plugin host
    ├── mod.rs
    └── lua_processor.rs

Conventions

Every file in src/processors/ is a real processor — no utility-only files at the top level. Shared helpers live in mod.rs or generators/mod.rs.
Checkers use SimpleChecker (data-driven, no boilerplate) or implement Processor directly for checkers with custom discovery logic (e.g., clippy, script).
Generators use SimpleGenerator (data-driven with a custom execute_fn) or GeneratorProcessor for the generic pass-through generator.
Creators use CreatorProcessor for the generic case, or their own struct for creators with special discovery (cargo profiles, npm siblings, etc.).
Explicit is a singleton processor type with its own folder because it is neither a checker nor a generator.
Lua is the only processor type that hosts external scripts rather than wrapping a fixed external tool. It has its own folder because it carries significant runtime state (the Lua VM).
All processors self-register via inventory::submit! at the bottom of their file — no central registry table to update.

Determinism

Build order is deterministic:

File discovery is sorted
Processor iteration order is sorted
Topological sort produces a stable ordering

This ensures that the same project always builds in the same order, regardless of filesystem ordering.

Caching

See Cache System for full details on cache keys, storage format, rebuild classification, and per-processor caching behavior.

Subprocess execution

RSConstruct uses two internal functions to run external commands:

run_command() — by default captures stdout/stderr via OS pipes and only prints output on failure (quiet mode). Use --show-output flag to show all tool output. Use for compilers, linters, and any command where errors should be shown.
run_command_capture() — always captures stdout/stderr via pipes. Use only when you need to parse the output (dependency analysis, version checks, Python config loading). Returns the output for processing.

Parallel safety

When running with -j, each thread spawns its own subprocess. Each subprocess gets its own OS-level pipes for stdout/stderr, so there is no interleaving of output between concurrent tools. On failure, the captured output for that specific tool is printed atomically. This design requires no shared buffers or cross-thread output coordination.

Path handling

All paths are relative to project root. RSConstruct assumes it is run from the project root directory (where rsconstruct.toml lives).

Internal paths (always relative)

Product.inputs and Product.outputs — stored as relative paths
FileIndex — returns relative paths from scan() and query()
Cache keys (Product.cache_key()) — use relative paths, enabling cache sharing across different checkout locations
Cache entries (CacheEntry.outputs[].path) — stored as relative paths

Processor execution

Processors pass relative paths directly to external tools
Processors set cmd.current_dir(project_root) to ensure tools resolve paths correctly
fs::read(), fs::write(), etc. work directly with relative paths since cwd is project root

Exception: Processors requiring absolute paths

If a processor absolutely must use absolute paths (e.g., for a tool that doesn’t respect current directory), it should:

Store the project_root in the processor struct
Join paths with project_root only at execution time
Never store absolute paths in Product.inputs or Product.outputs

Why relative paths?

Cache portability — cache keys don’t include machine-specific absolute paths
Remote cache sharing — same project checked out to different paths can share cache
Simpler code — no need to strip prefixes for display or storage

Architecture Observations

Observations about rsconstruct’s high-level structure — the shapes that determine how the system behaves when you try to change or extend it. Kept separate from suggestions.md (which is tactical features and bugs) because these are about how the code is put together, not about what it does.

Each entry has:

A short title naming the pattern or tension.
What the current code does.
What that implies for changes / extensions / users.
Load-bearing: how much of the system this shape dictates. High = touching it ripples everywhere. Low = localized quirk.

The entries are roughly ordered by how much they shape the rest of the codebase.

The central four

1. The graph is the universal coupling point

Every phase — discovery, analysis, classification, execution — reads and/or mutates the BuildGraph. Processors receive &mut BuildGraph in their discover() method and are trusted to add products correctly. There’s no invariant enforcement at insertion time: empty inputs are allowed, bad dep references are allowed, duplicate outputs are caught but duplicate inputs aren’t. Cycles are only detected during topological sort, late.

The graph’s shape also leaks into the executor: the executor knows about output_dirs (creators), variant (multi-format generators), config_hash (cache keys), and product IDs. Adding a new product category (say, a “phantom” product that exists for scheduling but produces no outputs) requires touching both graph and executor.

Implication: the graph is the lingua franca. Any architectural change that touches the product model — adding fields, changing what counts as a dependency, supporting alternate execution orders — ripples into every consumer. A healthy graph layer would have validation (reject ill-formed products at insertion), opaque access (consumers see a trait-shaped view, not the struct), and observer hooks (something watching mutations so --graph-stats and graph show don’t duplicate traversal logic).

Load-bearing: very high.

2. Plugin registration at link time

Every processor and analyzer submits an inventory::submit! entry. The registry is populated at binary link time, and enumeration is a runtime iteration over those entries. This is elegant for modularity — adding a processor means adding one file, no central list to update — but it has consequences:

No compile-time enumeration: you can’t write a match statement over all processor names, so the processor-count gets rediscovered on every run, and static checks (e.g. “every processor has a corresponding config struct”) have to be runtime assertions.
Lua plugins are second-class: they arrive at runtime after the static registry is frozen. The registry API has to tolerate two populations (static + dynamic) in parallel, which is why find_registry_entry and find_analyzer_plugin have to fall through both.
Ordering is alphabetical everywhere: because inventory doesn’t preserve submission order, every code path that touches plugins has to sort by name. This is a minor tax but it’s baked in everywhere.
Testing requires the whole binary: you can’t instantiate a stripped-down registry for tests; they pull the full set. Most tests don’t mind, but ones that want a controlled plugin set have to filter rather than inject.

Implication: the registration model favors modularity over introspectability. If rsconstruct ever wants a “declarative build” representation (think Bazel’s static action graph) the plugin layer will have to expose more schema information than it does today.

Load-bearing: high.

3. Config defaults are scattered, not composed — PARTIALLY ADDRESSED

Three sources of defaults apply in sequence:

Per-processor defaults (e.g. ruff → command = "ruff") in a giant match-or-registry lookup.
Scan defaults (src_dirs, src_extensions) via a separate mechanism (ScanDefaultsData).
User TOML overrides both.

The order matters, but it’s encoded across apply_processor_defaults, apply_scan_defaults, and the serde deserialization.

Update: config provenance tracking (src/config/provenance.rs) now records where each field came from (UserToml { line }, ProcessorDefault, ScanDefault, OutputDirDefault, SerdeDefault). rsconstruct config show annotates every field with its source. The defaults pipeline still applies layers across multiple functions, but the provenance map makes it possible to answer “where did this value come from?” without tracing the code.

The remaining gap: adding a new defaults layer (env-derived, user-global) still means inserting into the existing function chain rather than a declarative resolver.

Load-bearing: medium.

4. The executor owns too much policy — RESOLVED

Update: a BuildPolicy trait has been extracted to src/executor/policy.rs. classify_products now delegates per-product decisions to &dyn BuildPolicy. IncrementalPolicy implements the current skip/restore/rebuild logic. Alternate policies (dry-run, always-rebuild, time-windowed) are now a single trait implementation away — no executor changes needed.

Load-bearing: very high, but the tension is resolved.

Structural tensions

5. `Processor` trait assumes `StandardConfig`, but allows bypass

The Processor trait has a scan_config() -> &StandardConfig method that every processor must implement. The default implementations of discover(), auto_detect(), and supports_batch() use this config. But processors with richer configs (e.g. ClippyConfig, CcConfig) don’t expose those richer fields through the trait — they store them privately and access them internally. The outside world only sees StandardConfig.

Implication: there’s no way to ask “what config does processor X accept?” through the trait. Introspection goes through the registry (known_fields, must_fields, field_descriptions) instead, which means the processor has to register the metadata separately from implementing the trait. The two representations can drift: someone adds a field to ClippyConfig and forgets to add it to known_fields.

A healthier shape would have one source of truth per processor — the config struct itself — with a derive macro or trait-based reflection generating the known_fields list. Or go the other direction: make the trait parameterized (Processor<Config>) so introspection goes through the type system.

Load-bearing: medium. Doesn’t break anything today but is the root cause of several “remembered to update both places?” bugs we’ve fixed.

6. Analyzers are inputs-only; they can’t add products

DepAnalyzer::analyze() walks existing products and adds inputs to them. It cannot:

Create new products (the cpp analyzer can’t spawn a product for a header it discovered).
Remove products.
Change processor assignments.

This is a deliberate simplification — analyzers run in a single pass after discovery and don’t need fixed-point semantics of their own. But it means the “dependency graph” isn’t really discovered by analyzers; it’s refined by them. The actual discovery of what exists lives entirely in processors.

Implication: if a use case arises where an analyzer legitimately needs to produce a product — e.g. “for every .proto import I find, ensure there’s a product for generating the .pb.cc” — the analyzer interface doesn’t support it. You’d have to turn the analyzer into a processor, or add a “synthesize” callback. The asymmetry between processors (can add products) and analyzers (can only add inputs) is currently invisible but will bite eventually.

Load-bearing: medium. Not a bug, but a limitation that shapes what kinds of features are easy vs. hard.

7. Processor instance ↔ typed processor mapping is one-way — PARTIALLY ADDRESSED

A ProcessorInstance in the config holds (type_name, instance_name, config_toml). Builder::create_processors() deserializes the TOML and produces a Box<dyn Processor>. Afterwards, the TOML blob is discarded.

Update: ProcessorInstance now carries a provenance: ProvenanceMap that records where each field came from (user TOML with line number, processor default, scan default, etc.). This means config show can annotate fields with their source without reparsing TOML, and smart commands can distinguish user-set from defaulted fields.

The remaining gap: a running Box<dyn Processor> still can’t navigate back to its ProcessorInstance or the originating TOML section. The provenance lives on the config side, not the runtime processor side.

Load-bearing: medium.

8. Global state in the processor runtime — RESOLVED

Update: all mutable process globals have been moved into BuildContext (src/build_context.rs):

The three processor globals (INTERRUPTED, RUNTIME, INTERRUPT_SENDER) are replaced and deleted. run_command takes &BuildContext explicitly.
The three checksum globals (CACHE, MTIME_DB, MTIME_ENABLED) are moved into BuildContext. combined_input_checksum and checksum_fast take &BuildContext.

Remaining process-wide state is all immutable or correctly scoped:

RuntimeFlags — immutable after startup, doesn’t vary between contexts.
DECLARED_TOOLS — thread_local!, debug-only.
Compiled regexes — LazyLock<Regex>, stateless.

Load-bearing: resolved. Multiple BuildContext instances can now run independently (daemon mode, LSP, testing).

Broader patterns

9. Supply-driven model everywhere

The whole pipeline — discover, classify, execute — walks every product unconditionally. There’s no demand-driven path (like make foo which visits only the subgraph producing foo). The --target <glob> flag filters after discovery; it doesn’t trim the work that discovery itself does.

This is a deliberate design — rsconstruct’s typical workload is “build everything incrementally,” and supply-driven matches that well. But it means a user asking “just build X” still pays the cost of discovering all 5000 other products.

Implication: for projects at a certain scale, or for tooling that wants to quickly answer “which products would I run for this file?” (IDE integration, pre-commit hooks), the supply-driven model becomes a bottleneck. A demand-driven shortcut would require either pre-built reverse indexes (input path → product) persisted between runs, or an analytical model of each processor’s output paths (hard — processor output is computed procedurally).

Load-bearing: very high. Changing this means a fundamentally different build-system shape.

10. “Run on every build” is the default stance

Every configured processor discovers and classifies on every invocation. There’s no concept of “processor X is slow, only run when asked.” The -p/-x mechanism works per-invocation but not as a declarative property. See suggestions.md for the proposed build_by_default = false pattern — that’s a tactical fix. The architectural observation is that rsconstruct’s model biases hard toward “all processors together,” whereas the user mental model often has lifecycle phases (lint vs. package vs. deploy).

Implication: adding a “goals” layer (cargo-style subcommands, or npm-style named scripts) is a natural extension direction. It would introduce a new concept — a goal is a named selection of processors — and likely requires CLI reorganization. Bigger than it sounds.

Load-bearing: medium. Shapes the CLI surface and user mental model.

11. Object store as a multi-responsibility module — RESOLVED

Update: ObjectStore has been decomposed into focused submodules:

blobs.rs — content-addressed blob storage (store, read, restore, checksum)
descriptors.rs — cache descriptor CRUD (store_marker, store_blob, store_tree)
restore.rs — cache query and restoration (restore_from_descriptor, needs_rebuild, can_restore, explain)
management.rs — cache management (size, trim, remove_stale, list, stats)
operations.rs — remote cache push/fetch
config_diff.rs — processor config change tracking

mod.rs went from ~664 to ~223 lines (struct definition, types, constructor). Each concern is now a focused 100–150 line file.

Load-bearing: very high, but the monolith is resolved.

What’s absent that one might expect

12. No abstraction for “tool invocation”

Every processor that shells out to a subprocess rolls its own Command building: env vars, arg construction, timeout, output capture, error classification. Shared helpers (run_command, check_command_output) exist but are minimal. Processor implementations still have to know about:

How to pass files (positional args vs. --file=X vs. stdin vs. response file when argv is too long).
How to interpret exit codes (some tools return 1 for “found issues”, some return 0 and print to stderr, some return 2 for config errors).
How to parse output for structured errors.

Implication: processor implementations have roughly 30-80 lines of boilerplate each, and they’re inconsistent. A ToolInvocation abstraction with pluggable arg-passing strategies would shrink most processors to a few lines of declaration. This also makes adding a new processor harder than it needs to be.

Load-bearing: medium.

13. No pluggable reporting / event stream

Today reporting is hardcoded: println! during execution, colored summary at the end, --json mode emits structured events, --trace emits Chrome tracing format. Each reporting path is a separate code path threading through the executor.

Implication: adding a new output format (JUnit XML for CI, GitHub Actions annotations, custom Slack webhook) means threading another code path through the executor. A proper event-bus model — executor emits events, subscribers render them — would make this a two-file change (subscribe + format).

Load-bearing: medium.

14. No formal dry-run execution

There’s --stop-after classify, which stops after classification, and there’s dry_run() (different from --dry-run which is a flag on build), and there’s --explain which annotates per-product decisions. Three partially-overlapping mechanisms. The user-facing story is “to see what would happen, use X or Y or Z depending on what you want.”

Implication: these evolved separately. A unified “simulation mode” that fully runs the classify pipeline and outputs what would happen — including what cache entries would be produced — would subsume the three. Likely a small refactor, but requires aligning on the output shape.

Load-bearing: low-medium.

Summary of architectural recommendations

All four highest-leverage refactors are now complete:

~~Extract a BuildPolicy trait from the executor~~ — done. classify_products delegates per-product skip/restore/rebuild decisions to a &dyn BuildPolicy. IncrementalPolicy implements the current logic. Future policies (dry-run, always-rebuild, time-windowed) are a single trait impl. See src/executor/policy.rs.
~~Decompose ObjectStore~~ — done. mod.rs split from 664 → 223 lines into focused submodules: blobs.rs (content-addressed storage), descriptors.rs (cache descriptor CRUD), restore.rs (restore/needs_rebuild/can_restore/explain). Existing management.rs, operations.rs, config_diff.rs unchanged.
~~Consolidate config resolution with provenance tracking~~ — done. Config fields now carry FieldProvenance (user TOML with line number, processor default, scan default, serde default). config show annotates every field with its source. See src/config/provenance.rs.
~~Introduce a BuildContext struct replacing process globals~~ — done. The three process globals (INTERRUPTED, RUNTIME, INTERRUPT_SENDER) are replaced by a BuildContext struct threaded through the Processor trait, executor, analyzers, and remote cache. See src/build_context.rs.

Entries 3, 7, and 8 are partially addressed — the core issues are resolved but minor gaps remain (see individual entries above).

Entries 1, 2, 5, 6, 9, 10, 12, 13, 14 are observations about the code’s shape — not necessarily problems to fix, but constraints a new contributor should understand before making structural changes.

The technical observations (code duplication in discovery helpers, dead fields in ProcessorPlugin, scattered error handling) are recorded in suggestions.md as tactical items.

Design Notes

This page has been merged into Architecture. See that page for RSConstruct’s internal design, subprocess execution, path handling, and caching behavior.

Coding Standards

Rules that apply to the RSConstruct codebase and its documentation.

Always add context to errors

Every ? on an IO operation must have .with_context() from anyhow::Context. A bare ? on fs::read, fs::write, fs::create_dir_all, Command::spawn, or any other syscall-wrapping function is a bug. It produces error messages like “No such file or directory” with no indication of which file or which operation failed.

Good:

#![allow(unused)]
fn main() {
fs::read(&path)
    .with_context(|| format!("Failed to read config file: {}", path.display()))?;
}

Bad:

#![allow(unused)]
fn main() {
fs::read(&path)?;  // useless error message
}

The error chain should read like a stack trace of intent: “Failed to build project > Failed to execute ruff on src/main.py > Failed to spawn command: ruff > No such file or directory”.

Fail hard, never degrade gracefully

When something fails, it must fail the entire build. Do not try-and-fallback, do not silently substitute defaults for missing resources, do not swallow errors. If a processor is configured to use a file and that file does not exist, that is an error. The user must fix their configuration or their project, not the code.

Optional features must be opt-in via explicit configuration (default off). When the user enables a feature, all resources it requires must exist.

Processor naming conventions

Every processor has a single identity string (e.g. ruff, clang_tidy, mdbook). All artifacts derived from a processor must use that same string consistently:

Artifact	Convention	Example (`clang_tidy`)
Name constant	`pub const UPPER: &str = "name";` in `processors::names`	`CLANG_TIDY: &str = "clang_tidy"`
Source file	`src/processors/checkers/{name}.rs` or `generators/{name}.rs`	`checkers/clang_tidy.rs`
Processor struct	`{PascalCase}Processor`	`ClangTidyProcessor`
Config struct	`{PascalCase}Config`	`ClangTidyConfig`
Field on `ProcessorConfig`	`pub {name}: {PascalCase}Config`	`pub clang_tidy: ClangTidyConfig`
Match arm in `processor_enabled_field()`	`"{name}" => self.{name}.enabled`	`"clang_tidy" => self.clang_tidy.enabled`
Entry in `default_processors()`	`names::UPPER.into()`	`names::CLANG_TIDY.into()`
Entry in `validate_processor_fields()`	`processor_names::UPPER => {PascalCase}Config::known_fields()`	`processor_names::CLANG_TIDY => ClangTidyConfig::known_fields()`
Entry in `expected_field_type()`	`("{name}", "field") => Some(FieldType::...)`	`("clang_tidy", "compiler_args") => ...`
Entry in `src_dirs()`	`&self.{name}.scan`	`&self.clang_tidy.scan`
Entry in `resolve_scan_defaults()`	`self.{name}.scan.resolve(...)`	`self.clang_tidy.scan.resolve(...)`
Registration in `create_builtin_processors()`	`Builder::register(..., proc_names::UPPER, {PascalCase}Processor::new(cfg.{name}.clone()))`	`Builder::register(..., proc_names::CLANG_TIDY, ClangTidyProcessor::new(cfg.clang_tidy.clone()))`
Re-export in `processors/mod.rs`	`pub use checkers::{PascalCase}Processor`	`pub use checkers::ClangTidyProcessor`
Install command in `tool_install_command()`	`"{tool}" => Some("...")`	`"clang-tidy" => Some("apt install clang-tidy")`

When adding a new processor, use the identity string everywhere. Do not abbreviate, rename, or add suffixes (Gen, Bin, etc.) to any of the derived names.

Never use a _check suffix in processor names. Name the processor after the tool or library it wraps — do not abstract or rename it (e.g. zspell not spellcheck, ruff not python_lint).

Processor `new()` must be infallible

Every processor’s fn new(config: XxxConfig) -> Self must return Self, not Result<Self>. This is enforced at compile time by the registry macro. If construction can fail, defer the failure to execute() or discover().

Processor directory layout

Each processor category directory (src/processors/checkers/, src/processors/generators/, src/processors/creators/) must contain only processor implementation files — one processor per .rs file (plus mod.rs). Shared utilities, helpers, or supporting code used by multiple processors must live in src/processors/ directly, not inside a category subdirectory. This keeps each category directory a flat, scannable list of processors.

Test naming for processors

Test functions for a processor must be prefixed with the processor name. For example, tests for the cc_single_file processor must be named cc_single_file_compile, cc_single_file_incremental_skip, etc.

No indented output

All println! output must start at column 0. Never prefix output with spaces or tabs for visual indentation unless when printing some data with structure.

Suppress tool output on success

External tool output (compilers, linters, etc.) must be captured and only shown when a command fails. On success, only rsconstruct’s own status messages appear. Users who want to always see tool output can use --show-output. This keeps build output clean while still showing errors when something goes wrong.

Never hard-code counts of dynamic sets

Documentation and code must never state the number of processors, commands, or any other set that changes as the project evolves. Use phrasing like “all processors” instead of “all seven processors”. Enumerating the members of a set is acceptable; stating the cardinality is not.

Use well-established crates

Prefer well-established crates over hand-rolled implementations for common functionality (date/time, parsing, hashing, etc.). The Rust ecosystem has mature, well-tested libraries for most tasks. Writing custom implementations introduces unnecessary bugs and maintenance burden. If a crate exists for it, use it.

No trailing newlines in output

Output strings passed to println!, pb.println(), or similar macros must not contain trailing newlines. These macros already append a newline. Adding \n inside the string produces unwanted blank lines in the output.

Include processor name in error messages

Error messages from processor execution must identify the processor so the user can immediately tell which processor failed. The executor’s record_failure() method automatically wraps every error with [processor_name] before printing or storing it, so processors do not need to manually prefix their bail! messages. Just write the error naturally (e.g. bail!("Misspelled words in {}", path)) and the executor will produce [aspell] Misspelled words in README.md.

Never silently ignore user configuration

Every field a user can write in rsconstruct.toml (or in any YAML/TOML manifest we load: cc.yaml, linux-module.yaml, etc.) must produce an observable effect in the engine. The two failure modes to prevent are:

Schema-level silent-ignore — serde accepts an unknown field because the struct doesn’t reject it. A user typos enabeld = false, we accept it, nothing happens, they wonder why their setting had no effect.
Runtime silent-ignore — serde stores the field in a struct, but no code in the engine ever reads it. This is exactly how the [analyzer.X] enabled = false bug shipped: the CLI subcommand wrote the field, the config loader happily deserialized it, and the analyzer runner ignored it. A half-wired feature is worse than no feature.

Rule 1: reject unknown fields at the schema level

Every struct that deserializes user input must use one of:

#[serde(deny_unknown_fields)] — preferred for plain structs (no #[serde(flatten)]). Serde enforces the reject at deserialize time.
KnownFields trait + validate_processor_fields() — for top-level processor configs that use #[serde(flatten)] to embed StandardConfig. Serde’s deny_unknown_fields doesn’t see through flatten (known limitation), so we implement the check ourselves in Config::load().

Nested structs inside a flattened parent (e.g. CcLibraryDef inside CcManifest) must use deny_unknown_fields — they don’t flatten, so the direct mechanism works.

The only legitimate exception: structs that intentionally capture unknown fields (ProcessorConfig.extra for Lua plugins). These are rare and must be documented at the field.

Rule 2: every accepted field must be read

When you add a field to any config struct, add the engine code that consumes it in the same change. Don’t ship the schema first and the behaviour “soon.” If the field is a toggle, the runner must check it. If it’s a path, something must open or scan that path. If it’s a value, a code path must branch on it.

When you add a CLI subcommand that writes a field (like analyzers disable writing enabled = false), verify the runtime reads it by writing an integration test that exercises the toggle end-to-end — config → build → observable effect. A passing write-the-config test is not enough; the effect must be asserted.

When you remove or rename a field, grep the codebase and docs to catch stragglers. A field that exists in defconfig_toml but no longer affects behaviour is a regression of Rule 2, even if no user reports it.

When reviewing

Reject a patch that adds a new Deserialize struct without either deny_unknown_fields or a KnownFields impl. Reject a patch that adds a config field without the runtime code that reads it. Both failure modes cost users time in exactly the same way — they write something sensible, get no feedback, and conclude the tool is broken.

Rule 3: validate before constructing

Schema validation must run inside Config::load(), before any processor or analyzer is instantiated. Builder::new() should never be the first place that surfaces an unknown-field or unknown-type error, because by the time Builder::new() runs it has already opened redb databases, walked the filesystem to build the FileIndex, and created CPU-bound infrastructure the user doesn’t need just to see “you typoed a field name.”

The validators are validate_processor_fields_raw and validate_analyzer_fields_raw in src/config/mod.rs. They return Vec<String> so Config::load() can surface errors from both validators together under a single Invalid config: header. If you add a new config surface (a new top-level section with its own registered plugins), add a matching validator and call it from Config::load() alongside the existing two.

Unit-test the validators directly (see src/config/tests.rs) — not only through rsconstruct toml check. Direct tests pin down the contract that validation is a pure function of the parsed TOML, independent of filesystem or plugin instantiation.

No “latest” git tag

Never create a git tag named latest. Use only semver tags (e.g. v0.3.0). A latest tag causes confusion with container registries and package managers that use the word “latest” as a moving pointer, and it conflicts with GitHub’s release conventions.

Book layout mirrors the filesystem

The book (docs/src/) is divided into two sections by SUMMARY.md:

A top-level user-facing section (introduction, commands, configuration, processors, etc.) — for people who use rsconstruct to build their projects.
A “For Maintainers” section — for contributors modifying rsconstruct itself: architecture, design decisions, coding standards, cache internals, and so on.

The filesystem must mirror this split. A reader glancing at a path should be able to tell which audience the document is for:

User-facing chapters live at the top level of docs/src/ — e.g. docs/src/configuration.md, docs/src/commands.md.
Maintainer chapters live under docs/src/internal/ — e.g. docs/src/internal/architecture.md, docs/src/internal/cache.md.
Per-processor reference docs live under docs/src/processors/ — these are user-facing (they document how to configure each processor).

When adding a new doc, decide first whether it’s user-facing or internal, then place it accordingly. Moving a doc across the boundary requires moving the file too — don’t leave an internal document at the top level just because its links would break.

When cross-referencing:

Inside internal/ → link to sibling files directly ([X](other.md)).
From a top-level doc to an internal doc → [X](internal/other.md).
From processors/ to an internal doc → [X](../internal/other.md).
From internal/ to a user-facing doc → [X](../other.md).

This rule is enforced by convention, not by tooling. Reviewers should reject PRs that add a maintainer-only document at the top level (or vice versa).

Strictness

This project holds itself to a strict compiler baseline and treats every relaxation as a deliberate, documented choice. This chapter explains the baseline, the rules for opting out, and the history of the most recent strictness pass.

Crate-level baseline

src/main.rs starts with:

#![allow(unused)]
#![deny(clippy::all)]
#![deny(warnings)]
fn main() {
}

Effect:

Every warning is a compile error. Unused imports, dead code, unused variables, deprecated APIs — all stop the build. There is no “warning fatigue” because there are no warnings.
All of Clippy’s default lint group (clippy::all) is enforced at deny level. This covers ~500 lints spanning correctness, complexity, style, and perf.

This is one step short of forbid. forbid cannot be overridden per-item; deny allows a per-item #[allow(...)] escape hatch. We chose deny so that principled exceptions remain possible, but each one is an obvious, grep-able act.

The rule for `#[allow(...)]`

Every #[allow(...)] in the codebase MUST:

Be necessary. If the compiler accepts the code without the allow, remove the allow. The compiler is cleverer than you think — dead code that looks dead to a human is sometimes reachable, and vice versa.
Be scoped minimally. Attach the allow to the smallest item (a single field, a single function, a single import) that requires it — never to a whole struct or module when one member is the culprit.
Carry a comment explaining why. The comment answers: “what feature/workflow keeps this thing around despite looking dead?” A silent #[allow(dead_code)] is a bug.
Be periodically re-audited. Scaffolding becomes production code (allow removed) or is abandoned (code deleted). Long-lived allows are a code smell.

Current `#[allow]` attributes (at time of writing)

After the strictness pass, 5 allows remain. Each is documented in the source and reproduced here with rationale.

`src/object_store/mod.rs` — `remote_pull` field

#![allow(unused)]
fn main() {
/// Whether to pull from remote cache.
/// Wired into the constructor but not yet consulted by any read path —
/// remote-pull integration is scaffolded in `operations.rs` (the
/// `try_fetch_*` helpers) but not yet called from the executor.
#[allow(dead_code)]
remote_pull: bool,
}

Why kept: remote-pull is a real, partially-implemented feature. The try_fetch_* helpers exist; they’re just not wired into classify_products / the restore path yet. Removing the field now would mean re-adding it when we wire up the feature. Keeping it with a comment documents what’s missing.

When to remove: when remote-pull read paths are wired up, or when we formally abandon remote-pull.

`src/object_store/operations.rs` — three `try_fetch_` / `try_push_descriptor_` helpers

#![allow(unused)]
fn main() {
// Scaffolding for remote-pull: wired into the API surface but not yet
// called from any read path. Intentional; tracked under remote-pull WIP.
#[allow(dead_code)]
pub(super) fn try_fetch_object_from_remote(&self, checksum: &str) -> Result<bool> { ... }

// Scaffolding for remote-pull (for paired fetch-after-push semantics).
// Not yet called from any write path; tracked under remote-pull WIP.
#[allow(dead_code)]
pub(super) fn try_push_descriptor_to_remote(&self, descriptor_key: &str, data: &[u8]) -> Result<()> { ... }

/// Try to fetch a descriptor from remote cache.
/// Scaffolding for remote-pull; not yet called from any read path.
#[allow(dead_code)]
pub(super) fn try_fetch_descriptor_from_remote(&self, descriptor_key: &str) -> Result<Option<Vec<u8>>> { ... }
}

Why kept: same feature as above. These are the building blocks the eventual remote-pull implementation will call. They’re tested (implicitly via the types that check they compile), and they work when called — they just aren’t called yet.

When to remove: same trigger as the remote_pull field.

`src/registries/processor.rs` — `ProcessorPlugin.processor_type` field

#![allow(unused)]
fn main() {
pub struct ProcessorPlugin {
    pub name: &'static str,
    /// Processor type. Declared by every plugin but not yet queried by any
    /// runtime code path — kept as plugin metadata so future features
    /// (e.g. `processors list --type=checker`) can filter without touching
    /// every registration.
    #[allow(dead_code)]
    pub processor_type: ProcessorType,
    ...
}
}

Why kept: Every inventory::submit! for a processor declares a type (Checker, Generator, Creator, Explicit). The runtime currently reads processor_type() from the Processor trait, never from the plugin. But the static plugin metadata is the right place for filtering features like rsconstruct processors list --type=checker. Removing the field now would mean adding 93 processor_type: ... lines back later when we want the filter.

When to remove: never, once the first feature queries it. Until then, the allow is the cheap price of preserving optionality.

What the pass removed

Seven allows were removed during the most recent strictness sweep. Three of them masked genuine dead code, which was then deleted:

checksum::invalidate() — never called; deleted.
checksum::clear_cache() — never called; deleted.
ProcessorBase.name field + ProcessorBase::auto_detect() helper — never read, never called; deleted.

Four were stale — the code they guarded was actually used, and the allow no longer made the compiler quieter:

remote_cache::RemoteCache::download — used by operations.rs; allow removed.
exit_code::IoError — used in match arms and by the errors CLI command; allow removed.
ProcessorPlugin struct-level #[allow(dead_code)] — only the processor_type field needed it; scoped down.
builder/mod.rs — #[allow(unused_imports)] on use crate::config::*; — the compiler wasn’t flagging the glob at all; allow removed.

What this pass did NOT change

The sweep was focused on #[allow] attributes. Broader strictness knobs were left as-is, by choice:

.unwrap() and .expect() counts. Many are on internal invariants where panicking is correct (contract violation, not user error). An audit could tighten some to ?, but this is a separate pass with its own judgment calls.
missing_docs, missing_debug_implementations, etc. Enabling these would require documenting every public item — a much larger change.
clippy::pedantic, clippy::nursery, clippy::cargo. These add ~200 more lints beyond clippy::all. Many are noisy or stylistic. Enabling them is worth considering but outside the scope of “remove unnecessary allows.”
The use crate::config::*; glob import in builder/mod.rs. Narrowing it would require enumerating ~15 symbols and risks churn. Left as-is.

Adding a new `#[allow]`

When you find yourself wanting to add an #[allow(...)], follow this checklist:

Can the compiler complaint be fixed instead? Remove the unused import, inline the unused function, prove the variable is live. Most of the time the answer is yes.
Is this the minimum scope? Put the allow on the single field, not the whole struct. On the single function, not the whole impl. On the single import, not the whole use block.
Did you write a comment? One sentence answering “what feature / workflow justifies this?” is enough. “Reserved for future use” is NOT enough — say what future use, and what would trigger the deletion.
Did you open a tracking concern? If the allow is for WIP scaffolding, the WIP should be tracked somewhere (a TODO comment with a // wip: tag, an issue, a feature flag) so future maintainers know it’s temporary.

A reviewer who sees a new #[allow] should read the comment, check the rationale, and ask “could we just fix this instead?” before approving.

Running the audit

A quick sweep to find all current allows:

grep -rn '#\[allow(' src/

For each hit, read the surrounding context and the comment. If the comment is missing or weak, or the code it guards has become truly used, the allow should come out.

Testing

RSConstruct uses two kinds of tests:

Integration tests in tests/ — the primary test suite. These exercise the compiled rsconstruct binary as a black box, building fake projects in temp directories and asserting on CLI output and side effects.
Unit tests in src/ (#[cfg(test)] mod tests) — used sparingly, only for self-contained modules whose internals cannot be exercised adequately through the CLI. Currently this is src/graph.rs (dedup and topological-sort logic).

Running tests

cargo test              # Run all tests
cargo test rsconstructignore    # Run tests matching a name
cargo test -- --nocapture  # Show stdout/stderr from tests

Why unit tests live in `src/` (not `tests/`)

There is a recurring question: should unit tests move to tests/ to keep source files shorter and more readable? The short answer is no, for a structural reason specific to this crate.

This crate is a binary only — there is no src/lib.rs. Integration tests under tests/ can only link against a library crate; against a binary crate they can only do what tests/main.rs does today: spawn the rsconstruct binary as a subprocess and assert on its output. So there are only three real options for testing internal logic like BuildGraph:

Option	Cost
Unit tests inline in `src/` (current)	Longer source files (mitigated by `#[cfg(test)]` stripping them from release builds, and by editor folding)
Move tests to `tests/` as end-to-end tests	Far more code per test, much slower, indirect — can’t isolate a specific dedup branch without building a whole fake project
Add a `src/lib.rs` exposing modules	Architectural change — the crate becomes both a library and a binary. Forces decisions about what is public API

The third option is the “clean” fix but it has ongoing costs (API surface to maintain, semver implications if we ever publish the library). The first option has only a readability cost, and it’s the idiomatic Rust approach for binary crates.

Rule: default to writing integration tests in tests/. Only add a #[cfg(test)] mod tests block in src/ when the thing under test is genuinely hard to exercise through the CLI (e.g. a specific branch of a dedup helper that requires setting up graph state that would take dozens of real products to reproduce end-to-end). When a source file grows large enough that its inline test module dominates the file, split the tests into a sibling file via #[cfg(test)] mod tests; + src/MODULE/tests.rs, rather than moving them out of src/ entirely.

Test directory layout

tests/
├── common/
│   └── mod.rs                  # Shared helpers (not a test binary)
├── build.rs                    # Build command tests
├── cache.rs                    # Cache operation tests
├── complete.rs                 # Shell completion tests
├── config.rs                   # Config show/show-default tests
├── dry_run.rs                  # Dry-run flag tests
├── graph.rs                    # Dependency graph tests
├── init.rs                     # Project initialization tests
├── processor_cmd.rs            # Processor list/auto/files tests
├── rsconstructignore.rs                # .rsconstructignore / .gitignore exclusion tests
├── status.rs                   # Status command tests
├── tools.rs                    # Tools list/check tests
├── watch.rs                    # File watcher tests
├── processors.rs               # Module root for processor tests
└── processors/
    ├── cc_single_file.rs       # C/C++ compilation tests
    ├── zspell.rs           # Zspell processor tests
    └── template.rs             # Template processor tests

Each top-level .rs file in tests/ is compiled as a separate test binary by Cargo. The processors.rs file acts as a module root that declares the processors/ subdirectory modules:

#![allow(unused)]
fn main() {
mod common;
mod processors {
    pub mod cc_single_file;
    pub mod zspell;
    pub mod template;
}
}

This is the standard Rust pattern for grouping related integration tests into subdirectories without creating a separate binary per file.

Shared helpers

tests/common/mod.rs provides utilities used across all test files:

Helper	Purpose
`setup_test_project()`	Create an isolated project in a temp directory with `rsconstruct.toml` and basic directories
`setup_cc_project(path)`	Create a C project structure with the `cc_single_file` processor enabled
`run_rsconstruct(dir, args)`	Execute the `rsconstruct` binary in the given directory and return its output
`run_rsconstruct_with_env(dir, args, env)`	Same as `run_rsconstruct` but with extra environment variables (e.g., `NO_COLOR=1`)

All helpers use env!("CARGO_BIN_EXE_rsconstruct") to locate the compiled binary, ensuring tests run against the freshly built version.

Every test creates a fresh TempDir for isolation. The directory is automatically cleaned up when the test ends.

Test categories

Command tests

Tests in build.rs, clean, dry_run.rs, init.rs, status.rs, and watch.rs exercise CLI commands end-to-end:

#![allow(unused)]
fn main() {
#[test]
fn force_rebuild() {
    let temp_dir = setup_test_project();
    // ... set up files ...
    let output = run_rsconstruct_with_env(temp_dir.path(), &["build", "--force"], &[("NO_COLOR", "1")]);
    assert!(output.status.success());
    let stdout = String::from_utf8_lossy(&output.stdout);
    assert!(stdout.contains("[template] Processing:"));
}
}

These tests verify exit codes, stdout messages, and side effects (files created or removed).

Processor tests

Tests under processors/ verify individual processor behavior: file discovery, compilation, linting, incremental skip logic, and error handling. Each processor test module follows the same pattern:

Set up a temp project with appropriate source files
Run rsconstruct build
Assert outputs exist and contain expected content
Optionally modify a file and rebuild to test incrementality

Ignore tests

rsconstructignore.rs tests .rsconstructignore pattern matching: exact file patterns, glob patterns, leading / (anchored), trailing / (directory), comments, blank lines, and interaction with multiple processors.

Common assertion patterns

Exit code:

#![allow(unused)]
fn main() {
assert!(output.status.success());
}

Stdout content:

#![allow(unused)]
fn main() {
let stdout = String::from_utf8_lossy(&output.stdout);
assert!(stdout.contains("Processing:"));
assert!(!stdout.contains("error"));
}

File existence:

#![allow(unused)]
fn main() {
assert!(path.join("out/cc_single_file/main.elf").exists());
}

Incremental builds:

#![allow(unused)]
fn main() {
// First build
run_rsconstruct(path, &["build"]);

// Second build should skip
let output = run_rsconstruct_with_env(path, &["build"], &[("NO_COLOR", "1")]);
let stdout = String::from_utf8_lossy(&output.stdout);
assert!(stdout.contains("Skipping (unchanged):"));
}

Mtime-dependent rebuilds:

#![allow(unused)]
fn main() {
// Modify a file and wait for mtime to differ
std::thread::sleep(std::time::Duration::from_millis(100));
fs::write(path.join("src/header.h"), "// changed\n").unwrap();

let output = run_rsconstruct(path, &["build"]);
let stdout = String::from_utf8_lossy(&output.stdout);
assert!(stdout.contains("Processing:"));
}

Writing a new test

Add a test function in the appropriate file (or create a new .rs file under tests/ for a new feature area)
Use setup_test_project() or setup_cc_project() to create an isolated environment
Write source files and configuration into the temp directory
Run rsconstruct with run_rsconstruct() or run_rsconstruct_with_env()
Assert on exit code, stdout/stderr content, and output file existence

If adding a new processor test module, declare it in tests/processors.rs:

#![allow(unused)]
fn main() {
mod processors {
    pub mod cc_single_file;
    pub mod zspell;
    pub mod template;
    pub mod my_new_processor;  // add here
}
}

Test coverage by area

Area	File	Tests
Build command	`build.rs`	Force rebuild, incremental skip, clean, deterministic order, keep-going, timings, parallel -j flag, parallel keep-going, parallel all-products, parallel timings, parallel caching
Cache	`cache.rs`	Clear, size, trim, list operations
Complete	`complete.rs`	Bash/zsh/fish generation, config-driven completion
Config	`config.rs`	Show merged config, show defaults, annotation comments
Dry run	`dry_run.rs`	Preview output, force flag, short flag
Graph	`graph.rs`	DOT, mermaid, JSON, text formats, empty project
Init	`init.rs`	Project creation, duplicate detection, existing directory preservation
Processor command	`processor_cmd.rs`	List, all, auto-detect, files, unknown processor error
Status	`status.rs`	UP-TO-DATE / STALE / RESTORABLE reporting
Tools	`tools.rs`	List tools, list all, check availability
Watch	`watch.rs`	Initial build, rebuild on change
Ignore	`rsconstructignore.rs`	Exact match, globs, leading slash, trailing slash, comments, cross-processor
Template	`processors/template.rs`	Rendering, incremental, dep_inputs
CC	`processors/cc_single_file.rs`	Compilation, headers, per-file flags, mixed C/C++, config change detection
Zspell	`processors/zspell.rs`	Correct/misspelled words, code block filtering, custom words, incremental

Parameter Naming Conventions

This document establishes the canonical names for configuration parameters across all processors, and the reasoning behind each name. Use this as the reference when adding new processors or renaming existing ones.

Taxonomy

Parameters fall into four categories:

Category	Purpose
Source discovery	Which files are the primary targets to process
Dependency tracking	Which additional files affect the checksum / trigger rebuilds
Tool configuration	What command/tool to run and how
Execution control	Batching, parallelism, output location

Source Discovery Parameters

These parameters determine which files are the primary inputs — the files that get processed, linted, or transformed.

Parameter	Type	Description
`src_dirs`	string[]	Directories to scan recursively for source files.
`src_extensions`	string[]	File extensions to match during scanning (e.g. `[".py", ".pyi"]`).
`src_exclude_dirs`	string[]	Directory path segments to skip during scanning.
`src_exclude_files`	string[]	File names to skip during scanning.
`src_exclude_paths`	string[]	Exact relative paths to skip during scanning.
`src_files`	string[]	Explicit list of source files to process. When set, bypasses `src_dirs`, `src_extensions`, and all exclude filters entirely.

`src_files` vs scanning

src_dirs + src_extensions is the default discovery mechanism — the processor walks directories and finds matching files automatically.

src_files is for when you know exactly which files you want processed and don’t want any scanning. Setting src_files disables all scan-based discovery for that processor instance.

Dependency Tracking Parameters

These parameters declare files that the processor depends on but does not process directly. A change to any of these files invalidates the cache and triggers a rebuild, but the files are not passed as arguments to the tool.

Parameter	Type	Description
`dep_inputs`	string[]	Explicit dependency files (e.g. config files, schema files). Globs are supported. Fails if a listed file does not exist.
`dep_auto`	string[]	Like `dep_inputs` but silently ignored when the file does not exist. Used for optional config files (e.g. `.pylintrc`, `pyproject.toml`).

Why two parameters?

dep_inputs is strict — it errors if a file is missing, which catches mistakes in configuration. dep_auto is lenient — it is for well-known config files that may or may not be present in a given project.

Tool Configuration Parameters

command and args always appear together. Every processor that has command must also have args. They are treated as a unit: both participate in the config checksum (computed from each processor’s checksum_fields()), so changing either the command or any argument invalidates the cache and triggers a rebuild.

Parameter	Type	Description
`command`	string	The executable to run. Required when the processor is active. If the value is a path to a local file, its content checksum is also tracked as a dependency.
`args`	string[]	Arguments passed to the command before file paths. Always present alongside `command`. Both `command` and `args` values are included in the config checksum.

`command` dependency tracking

For the script and generator processors, if command points to a file that exists on disk (e.g. command = "scripts/my_linter.sh"), rsconstruct automatically adds it as an input dependency. This means that if the script itself changes, all affected products are rebuilt. System tools (e.g. bash, python3) are not files in the project and are not tracked.

Execution Control Parameters

Parameter	Type	Description
`batch`	bool	When `true`, pass all files to the command in a single invocation. When `false`, invoke once per file. Default: `true` for most processors.
`max_jobs`	int	Maximum parallel jobs for this processor. Overrides the global `--jobs` flag.
`output_dir`	string	Directory where output files are written (generator processors).
`output_extension`	string	File extension for generated output files.

Processor Contract

Rules that all processors must follow.

Fail hard, never degrade gracefully

Optional features must be opt-in via explicit configuration (default off). When the user enables a feature, all resources it requires must exist.

No work without source files

An enabled processor must not fail the build if no source files match its file patterns. Zero matching files means zero products discovered; the processor simply does nothing. This is not an error — it is the normal state for a freshly initialized project.

Single responsibility

Each processor handles one type of transformation or check. A processor discovers its own products and knows how to execute, clean, and report on them.

Deterministic discovery

discover() receives an instance_name parameter identifying the processor instance (e.g., "ruff" or "script.lint_a" for multi-instance processors). Use this name when calling graph.add_product() — do not use hardcoded processor type constants.

discover() must return the same products given the same filesystem state. File discovery, processor iteration, and topological sort must all produce sorted, deterministic output so builds are reproducible.

Incremental correctness

Products must declare all their inputs. If any declared input changes, the product is rebuilt. If no inputs change, the cached result is reused. Processors must not rely on undeclared side inputs for correctness (support files read at execution time but excluded from the input list are acceptable only when changes to those files can never cause a previously-passing product to fail).

Execution isolation

A processor’s execute() must only write to the declared output paths (or, for creators, to the expected output directory). It must not modify source files, other products’ outputs, or global state.

Output directory caching (creators)

Creators that set output_dir on their products get automatic directory-level caching. After successful execution, the executor walks the output directory, stores every file as a content-addressed object, and records a manifest with paths, checksums, and Unix permissions. On restore, the entire directory is recreated from cache.

The cache_output_dir config option (default true) controls this. When disabled, creators fall back to stamp-file or empty-output caching (no directory restore on rsconstruct clean && rsconstruct build).

Creators that use output_dir caching must implement clean() to remove the output directory so it can be restored from cache.

Error reporting

On failure, execute() returns an Err with a clear message including the relevant file path and the nature of the problem. The executor decides whether to abort or continue based on --keep-going.

Batch execution and partial failure

Batch-capable processors implement supports_batch() and execute_batch(). The execute_batch() method receives multiple products and must return one Result per product, in the same order as the input.

External tool processors that invoke a single subprocess for the entire batch typically use execute_generator_batch(), which maps a single exit code to all-success or all-failure. If the tool fails, all products in the batch are marked failed — there is no way to determine which outputs are valid.

Internal processors (e.g., imarkdown2html, isass, ipdfunite) that process files in-process should return per-file results so that partial failure is handled correctly — only the actually-failed products are rebuilt on the next run.

Chunk sizing: In fail-fast mode (default), the executor uses chunk_size=1 even for batch-capable processors, so each product is cached individually. This gives the best incremental recovery. Larger chunks are used only with --keep-going or explicit --batch-size.

Cache System

RSConstruct uses a content-addressed cache to enable fast incremental builds. This page describes the cache architecture, storage format, and rebuild logic.

Overview

The cache lives in .rsconstruct/ and consists of:

objects/ — content-addressed object store (all cache data)
deps.redb — source file dependency cache (see Dependency Caching)

There is no separate database. The object store is the cache.

Data model

The object store contains three kinds of objects, inspired by git:

Blobs

A blob is a file’s raw content, addressed by its SHA-256 content hash. Blobs are optionally zstd-compressed and made read-only to prevent corruption when restored via hardlinks.

Blobs are stored content-addressed — two products producing identical output share the same blob. This enables deduplication and hardlink-based restoration.

Why blobs don’t store output paths

A blob is pure content — it has no knowledge of where it will be restored. This is critical for two reasons:

Rename survival. If you rename foo.md to bar.md without changing its content, the cache key (which is content-addressed) is the same. The blob is reused and restored to the new output path (bar.txt instead of foo.txt). If the blob stored its output path, this wouldn’t work.
Deduplication across trees. Multiple tree entries can point to the same blob under different paths. For example, if two files in a creator’s output have identical content, they share the same blob object in the store. The tree records the path; the blob just holds the content.

Trees

A tree is a serialized list of (path, mode, blob_checksum) entries describing a set of output files. Trees are stored in the object store, addressed by the cache key (not by content hash). A tree maps relative file paths to content-addressed blobs. Multiple trees can point to the same blobs — deduplication happens at the blob level.

Markers

A marker is a zero-byte object indicating that a check passed. Markers are stored in the object store, addressed by the cache key.

Cache entries

A cache entry is a small descriptor stored in the object store at the path derived from the cache key. It contains:

{"type": "blob", "checksum": "abc123...", "mode": 493}

Note: the blob descriptor has no path — the product knows where its output goes.

Or:

{"type": "tree", "entries": [{"path": "dir/file.txt", "checksum": "def456...", "mode": 493}]}

or:

{"type": "marker"}

The actual file content lives in separate content-addressed blob objects. The cache entry is just a pointer (for generators) or a manifest (for creators).

Object store layout

.rsconstruct/objects/
  a1/b2c3d4...    # could be a blob (raw file content)
  ff/0011aa...    # could be a cache entry (JSON descriptor)
  cd/ef5678...    # could be another blob

Cache entries and blobs share the same object store. Cache entries are addressed by cache key hash; blobs are addressed by content hash.

Cache keys

The cache key identifies a product. It is computed as:

hash(processor_name, config_hash, input_content_hash)

Where:

processor_name — the processor type (e.g., pandoc, ruff)
config_hash — hash of the processor configuration (compiler flags, args, etc.)
input_content_hash — combined SHA-256 hash of all input file contents

The key is content-addressed: it depends on what the inputs contain, not what they’re named. Renaming a file without changing its content produces the same cache key.

Multi-format processors

For processors that produce multiple output formats from the same input (e.g., pandoc producing PDF, HTML, and DOCX), each format is a separate product with a separate cache key. The output format is part of the config hash, so each format gets its own key naturally.

Output depends on input name

Most processors produce output that depends only on input content. However, some processors embed the input filename in the output (e.g., a // Generated from foo.c header). For these processors, the output_depends_on_input_name property is set to true, and the input file path is included in the cache key:

hash(processor_name, config_hash, input_content_hash, input_path)

Flows

Lookup

Compute the cache key from processor name + config + input contents
Look up the object at that key in the object store
If not found: cache miss, product must be built
If found: read the descriptor, act based on type

Cache (after successful build)

Checker:

Store a {"type": "marker"} entry at the cache key

Generator (single output):

Store the output file content as a content-addressed blob
Store a {"type": "blob", "checksum": "..."} entry at the cache key

Creator (multiple outputs):

Walk all output directories and files
Store each file as a content-addressed blob
Build the tree entries: [{"path": "...", "checksum": "...", "mode": ...}, ...]
Store a {"type": "tree", "entries": [...]} entry at the cache key

Restore

Checker: Nothing to restore. Cache entry exists = check passed.

Generator:

Read the cache entry, get the blob checksum
Hardlink or copy the blob to the output path

Creator:

Read the cache entry, get the tree entries
For each (path, checksum, mode): restore the blob to the path, set permissions

Skip

If the cache entry exists AND all output files are present on disk, no work is needed.

Rebuild classification

Classification	Condition	Action
Skip	Cache key found AND all outputs exist on disk	No work needed
Restore	Cache key found BUT some outputs are missing	Restore from object store
Build	No cache entry for this key	Execute the processor

Because the cache key incorporates input content, a changed input produces a different key. There’s no “stale entry” — either the key exists or it doesn’t.

Config-aware caching

Processor configuration is hashed into cache keys. Changing a config value triggers rebuilds even if source files haven’t changed.

Cache restoration methods

Method	Behavior	Best for
`hardlink`	Links output to cached blob (same inode, read-only)	Local development (fast, no disk space)
`copy`	Copies cached blob to output path (writable)	CI runners, cross-filesystem setups
`auto` (default)	Uses `copy` when `CI=true`, `hardlink` otherwise	Most setups

Hardlinks work because blob objects contain raw file content (not wrapped in a descriptor). Only cache entries (which point to blobs) contain JSON metadata.

Cache commands

Command	Description
`rsconstruct cache size`	Show cache size and object count
`rsconstruct cache list`	List all cache entries as JSON
`rsconstruct cache stats`	Show per-processor cache statistics
`rsconstruct cache trim`	Remove unreferenced objects
`rsconstruct cache clear`	Delete the entire cache

Clean vs Clear

rsconstruct clean removes build outputs but preserves the cache:

Generators: Output files deleted. Next build restores via hardlink/copy.
Checkers: Nothing to delete. Next build skips.
Creators: Output directories deleted. Next build restores from tree.

rsconstruct cache clear wipes everything — descriptors and blobs. A cleared cache means “forget everything, rebuild from scratch.” The entire .rsconstruct/ directory is removed. If only blobs were cleared but descriptors survived, the cache would think outputs are available but fail to restore them. Clearing both together avoids this inconsistency.

Incremental rebuild after partial failure

Each product is cached independently after successful execution. If a build fails partway through, the next run only rebuilds products without valid cache entries.

Remote caching

See Remote Caching for sharing cache between machines and CI.

Checksum Cache

RSConstruct uses a centralized checksum system (src/checksum.rs) for all file hashing. It has two layers of caching to avoid redundant I/O and computation.

Architecture

All file checksum operations go through a single entry point: checksum::file_checksum(path). This function never computes the same hash twice.

Layer 1: In-memory cache (per build run)

A global HashMap<PathBuf, String> stores checksums computed during the current build. When a file is checksummed for the first time, the result is cached. Any subsequent request for the same file returns the cached value without reading the file again.

This handles the common case where the same file appears as an input to multiple products (e.g., a shared header file), or when the checksum is needed both for classification (skip/restore/build) and for cache storage.

The in-memory cache lives for the duration of the process and is not persisted.

Layer 2: Mtime database (across builds)

A persistent redb database at .rsconstruct/mtime.redb maps file paths to (mtime, checksum) pairs. Before reading a file to compute its checksum, the system checks:

Has this file been checksummed in a previous build?
Has the file’s modification time changed since then?

If the mtime matches, the cached checksum is returned without reading the file. This avoids I/O for files that haven’t been modified between builds — the common case in incremental builds where most files are unchanged.

When the mtime differs (file was modified), the file is read, the new checksum is computed, and both the in-memory cache and the mtime database are updated.

Dirty mtime entries are flushed to the database in a single batch transaction at the end of each checksum computation pass, minimizing database writes.

Why two layers

Layer	Scope	Avoids	Cost
In-memory cache	Single build run	Re-reading + re-hashing the same file	HashMap lookup
Mtime database	Across builds	Reading unchanged files from disk	`stat()` + DB lookup

For the first build, every file must be read and hashed. The mtime database is populated as a side effect. On subsequent builds, most files are unchanged — the mtime check skips reading them entirely, and the in-memory cache prevents redundant lookups within the run.

Configuration

The persistent mtime database can be disabled via rsconstruct.toml:

[cache]
mtime_check = false

Or via the command-line flag:

rsconstruct build --no-mtime-cache

When disabled, every file is read and hashed on every build. The in-memory cache still prevents redundant reads within a single run, but there is no cross-build benefit.

When to disable: In CI/CD environments with a fresh checkout, the mtime database has nothing cached from previous builds and just adds write overhead. The in-memory cache is sufficient. Use --no-mtime-cache (or mtime_check = false in config) to skip the database entirely.

The rsconstruct status command also disables mtime checking internally to ensure accurate classification.

Database location

The mtime database is stored at .rsconstruct/mtime.redb, separate from the build cache (objects/ and descriptors/) and the config tracking database. This separation means:

rsconstruct cache clear removes the build cache but preserves the mtime database (the next build will still benefit from mtime-based skipping)
The mtime database can be deleted independently without affecting cached build outputs

Combined input checksum

The combined_input_checksum(inputs) function computes a single hash representing all input files for a product. It:

Checksums each input file (using the two-layer cache)
Joins all checksums with :
Hashes the combined string to produce a fixed-length result

Missing files get a MISSING:<path> sentinel so that different sets of missing files produce different combined checksums.

Dependency Caching

RSConstruct includes a dependency cache that stores source file dependencies (e.g., C/C++ header files) to avoid re-scanning files that haven’t changed. This significantly speeds up the graph-building phase for projects with many source files.

Overview

When processors like cc_single_file discover products, they need to scan source files to find dependencies (header files). This scanning can be slow for large projects. The dependency cache stores the results so subsequent builds can skip the scanning step.

The cache is stored in .rsconstruct/deps.redb using redb, an embedded key-value database.

Cache Structure

Each cache entry consists of:

Key: Source file path (e.g., src/main.c)
Value:
- source_checksum — SHA-256 hash of the source file content
- dependencies — list of dependency paths (header files)

Cache Lookup Algorithm

When looking up dependencies for a source file:

Look up the entry by source file path
If not found → cache miss, scan the file
If found, compute the current SHA-256 checksum of the source file
Compare with the stored checksum:
- If different → cache miss (file changed), re-scan
- If same → verify all cached dependencies still exist
If any dependency file is missing → cache miss, re-scan
Otherwise → cache hit, return cached dependencies

Why Path as Key (Not Checksum)?

An alternative design would use the source file’s checksum as the cache key instead of its path. This seems appealing because you could look up dependencies directly by content hash. However, this approach has significant drawbacks:

Problems with Checksum as Key

Mandatory upfront computation: With checksum as key, you must compute the SHA-256 hash of every source file before you can even check the cache. This means reading every file on every build, even when nothing has changed.

With path as key, you do a fast O(1) lookup first. Only if there’s a cache hit do you compute the checksum to validate freshness.
Orphaned entries accumulate: When a file changes, its old checksum entry becomes orphaned garbage. You’d need periodic garbage collection to clean up stale entries.

With path as key, the entry is naturally updated in place when the file changes.
No actual benefit: The checksum is still needed for validation regardless of the key choice. Using it as the key just moves when you compute it, without reducing total work.

Current Design

The current design is optimal:

Path (key) → O(1) lookup → Checksum validation (only on hit)

This minimizes work in the common case where files haven’t changed.

Cache Statistics

During graph construction, RSConstruct displays cache statistics:

[cc_single_file] Dependency cache: 42 hits, 3 recalculated

This shows how many source files had their dependencies retrieved from cache (hits) versus re-scanned (recalculated).

Viewing Dependencies

Use the rsconstruct deps command to view the dependencies stored in the cache:

rsconstruct deps all                    # Show all cached dependencies
rsconstruct deps for src/main.c         # Show dependencies for a specific file
rsconstruct deps for src/a.c src/b.c    # Show dependencies for multiple files
rsconstruct deps clean                  # Clear the dependency cache

Example output:

src/main.c: (no dependencies)
src/test.c:
  src/utils.h
  src/config.h

The rsconstruct deps command reads directly from the dependency cache without building the graph. If the cache is empty (e.g., after rsconstruct deps clean or on a fresh checkout), run a build first to populate it.

This is useful for debugging rebuild behavior or understanding the include structure of your project.

Cache Invalidation

The cache automatically invalidates entries when:

The source file content changes (checksum mismatch)
Any cached dependency file no longer exists

You can manually clear the entire dependency cache by removing the .rsconstruct/deps.redb file, or by running rsconstruct clean all which removes the entire .rsconstruct/ directory.

Processors Using Dependency Caching

Currently, the following processors use the dependency cache:

cc_single_file — caches C/C++ header dependencies discovered by the include scanner

Implementation

The dependency cache is implemented in src/deps_cache.rs:

#![allow(unused)]
fn main() {
pub struct DepsCache {
    db: redb::Database,
    stats: DepsCacheStats,
}

impl DepsCache {
    pub fn open() -> Result<Self>;
    pub fn get(&mut self, source: &Path) -> Option<Vec<PathBuf>>;
    pub fn set(&self, source: &Path, dependencies: &[PathBuf]) -> Result<()>;
    pub fn flush(&self) -> Result<()>;
    pub fn stats(&self) -> &DepsCacheStats;
}
}

The cache is opened once per processor discovery phase, queried for each source file, and flushed to disk at the end.

Processor Versioning and Cache Invalidation

When a processor’s implementation changes in a way that produces different output for the same input, every cached entry it produced becomes potentially stale. This chapter documents the problem, the design alternatives we considered, and the chosen approach.

The problem

rsconstruct’s cache is content-addressed on a key derived from:

Primary input file checksums
dep_inputs / dep_auto file checksums
output_config_hash (the processor’s relevant config fields)
Tool version hash (optional — e.g. ruff --version output)

Crucially absent: the implementation of the processor itself.

Consider: a user upgrades rsconstruct to a version where the ruff wrapper now passes a new flag by default. Inputs haven’t changed. Config hasn’t changed. Ruff’s binary version hasn’t changed. But the output is different — the new flag changes behavior.

rsconstruct sees a cache hit on the old descriptor and restores the stale result. The user gets incorrect output from “fresh” caches.

Design alternatives considered

Option A: Hash the binary at startup

Compute a SHA of the rsconstruct binary itself at program start. Mix that hash into every product’s cache key.

How it works: Any change to any part of rsconstruct — processors, core executor, cache code, even comments — invalidates every cache entry.

Pros:

Trivially correct. If any code changed, caches are invalidated.
Zero developer action.
No risk of forgotten invalidation.

Cons:

Massively over-invalidates. Fixing a typo in a docstring or reformatting the clean command wipes every user’s cache across every processor.
Makes iterating on rsconstruct itself painful — developers constantly rebuild everything.
Version bumps of unrelated dependencies (regex bumps, anyhow bumps) change the binary and also invalidate.

Option B: Per-file source hash (automatic)

build.rs hashes each processor’s .rs file at compile time. The hash is embedded as a &'static str into that processor’s plugin entry. Cache key includes this hash.

How it works: Modify src/processors/checkers/ruff.rs, next build picks up a new hash, ruff’s caches invalidate. Other processors are unaffected.

Pros:

Zero developer action — hashes are automatic.
More precise than Option A — only the changed processor invalidates.
Never forget to bump.

Cons:

Too sensitive. Whitespace changes, comment fixes, rustfmt reformats, renames of private helpers — all invalidate the cache even though behavior is identical.
Doesn’t catch indirect changes. If a processor calls shared helpers in processors/mod.rs and those change, the processor’s file hash hasn’t changed but its behavior has. We need to hash transitive dependencies, and Rust doesn’t give us an easy way.
Non-deterministic sources of churn: different rustfmt versions produce different hashes for the same intent, CI vs. local editor differences cause spurious invalidation.
Signal dilution: users stop paying attention to “this rebuilt” because it happens even for cosmetic changes. The signal loses meaning.

Option C: Whole `src/processors/` subtree hash

Hash the entire processors directory at compile time. Any change to anything under src/processors/ invalidates every processor’s cache.

How it works: Middle ground between A and B.

Pros:

Catches shared-helper changes automatically (since helpers are in the same subtree).
Less aggressive than A — core-executor tweaks don’t invalidate.

Cons:

Still over-invalidates — a fix to processor X wipes processor Y’s cache.
Still vulnerable to formatting/comment churn.

Option D: Explicit per-processor version (manual)

Each processor declares a version: u32 in its plugin entry. The developer bumps it when making a behavior-changing modification. Cache key includes the version.

How it works:

#![allow(unused)]
fn main() {
inventory::submit! { ProcessorPlugin {
    name: "ruff",
    version: 1,   // bump when behavior changes
    ...
}}
}

Commit Processor ruff: change default flags becomes the same commit as version: 1 → version: 2.

Pros:

Precise. Only bumps when the developer decides behavior actually changed.
Stable. Reformats, comment edits, renames do not invalidate caches.
Auditable. Every version bump is visible in git history as a deliberate one-line change with its own rationale.
Cross-platform deterministic — a number, not a hash sensitive to file encoding.
Signal stays meaningful — users see a rebuild only when something actually changed.

Cons:

Relies on developer discipline. Forgetting to bump after a behavior change leaves stale caches surviving — a silent correctness bug, arguably worse than no invalidation (because it creates a false sense of safety).
Requires a documented bump rule so the convention is followed.
Can be mitigated by code review (diffs show version bumps) and optional CI checks (warn when a processor file changes without a version bump).

Option E: Hybrid — manual version OR automatic hash, whichever is larger

Both fields exist. The cache key includes max(manual version, auto hash). Belt-and-suspenders.

Pros: Catches both forgotten bumps and behavior changes.

Cons: Complexity. Two systems doing nearly the same thing. Users don’t know which one is “the” trigger. Debugging cache misses becomes harder. Loses the “explicit and predictable” property of Option D.

Decision: Option D (explicit per-processor version)

For a build system that cares about cache correctness, deliberate is better than automatic:

Cache stability is a feature. Users expect their caches to survive a refactor, a cargo fmt, a whitespace cleanup. An automatic hash violates this expectation constantly.
A version bump documents intent. git blame on the version: line shows why behavior changed. An auto hash leaves no such record.
The discipline cost is low. Each behavior-changing commit already requires care — adding a one-line version bump to that care is trivial. Forgetting to bump is caught by code review, same as forgetting a changelog entry or a test.
The discipline failure mode is recoverable. Worst case: a version bump is forgotten, users report stale caches, we bump the version retroactively in the next release. This is better than the Option B failure mode (constant spurious invalidation drives users to distrust the system).

The bump rule

Bump a processor’s version when ANY of:

The processor would produce different output files for the same inputs.
The processor would include different content in an output file for the same inputs.
The processor changes which inputs are discovered (e.g. a new glob pattern, a changed default).
The processor changes which paths are declared as outputs.
The processor’s interpretation of a config field changes (e.g. what a flag means, how a default is resolved).

Do NOT bump for:

Refactors with identical behavior.
Comment / docstring changes.
Reformatting.
Renaming of internal helpers.
Performance improvements that don’t change output.
Bug fixes in error messages (but DO bump if the fix changes which inputs succeed/fail).

When in doubt, bump. A bump is cheap (rebuild all products of one processor once); a missed bump is a correctness bug.

Implementation outline

Add a required version: u32 field to ProcessorPlugin (no default — every processor must declare it).
Include the version in the cache key via output_config_hash or descriptor_key.
Initialize all existing processors to version: 1.
Document the bump rule in a prominent comment near the field definition.
(Optional, future) CI check: if a processor file’s git diff touches logic but not the version: line, post a warning comment on the PR.

Migration

On the first release after this change ships, every existing cache entry is invalidated (the cache key schema changed). This is a one-time cost, same as any cache-key schema evolution. Users will see a full rebuild once, then cache behavior resumes normally.

Cross-Processor Dependencies

This chapter discusses the problem of one processor’s output being consumed as input by another processor, and the design options for solving it.

The Problem

Consider a template that generates a Python file:

tera.templates/config.py.tera  →  (template processor)  →  config.py

Ideally, ruff should then lint the generated config.py. Or a template might generate a C++ source file that needs to be compiled by cc_single_file and linted by cppcheck. Chains can be arbitrarily deep:

template  →  generates foo.sh  →  shellcheck lints foo.sh
template  →  generates bar.c   →  cc_single_file compiles bar.c  →  cppcheck lints bar.c

Currently this does not work. Each processor discovers its inputs by querying the FileIndex, which is built once at startup by scanning the filesystem. Files that do not exist yet (because they will be produced by another processor) are invisible to downstream processors. No product is created for them, and no dependency edge is formed.

Why It Breaks

The build pipeline today is:

Walk the filesystem once to build FileIndex
Each processor runs discover() against that index
resolve_dependencies() matches product inputs to product outputs by path
Topological sort and execution

Step 3 already handles cross-processor edges correctly: if product A declares output foo.py and product B declares input foo.py, a dependency edge from A to B is created automatically. The problem is that step 2 never creates product B in the first place, because foo.py is not in the FileIndex.

How Other Build Systems Handle This

Bazel

Bazel uses BUILD files where rules explicitly declare their inputs and outputs. Dependencies are specified by label references, not by filesystem scanning. However, Bazel does use glob() to discover source files during its loading phase. The key insight is that during the analysis phase, both source files (from globs) and generated files (from rule declarations) are visible in a unified view. A rule’s declared outputs are known before any action executes.

Buck2

Buck2 takes a similar approach with a single unified dependency graph (no separate phases). Rules call declare_output() to create artifact references and return them via providers. Downstream rules receive these references through their declared dependencies. For cases where the dependency structure is not known statically, Buck2 provides dynamic_output — a rule can read an artifact at build time to discover additional dependencies.

Common Pattern

In both systems, the core principle is the same: a rule’s declared outputs are visible to the dependency resolver before execution begins. The dependency graph is fully resolved at analysis time.

Proposed Solutions

A. Multi-Pass Discovery (Iterative Build-Scan Loop)

Run discovery, build what is ready, re-scan the filesystem, discover again. Repeat until nothing new is found.

Pro: Simple mental model, handles arbitrary chain depth
Con: Slow (re-scans filesystem each pass), hard to detect infinite loops, execution is interleaved with discovery

B. Virtual Files from Declared Outputs (Two-Pass)

After the first discovery pass, collect all declared outputs from the graph and inject them as “virtual files” visible to processors. Run discovery a second time so downstream processors can find the generated files.

Pro: No filesystem re-scan, single build execution phase, deterministic
Con: Limited to chains of depth 1 (producer → consumer). A three-step chain (template → compile → lint) would require three passes, making the fixed two-pass design insufficient.

C. Fixed-Point Discovery Loop

Generalization of Approach B. Run discovery in a loop: after each pass, collect newly declared outputs and feed them back as known files for the next pass. Stop when a full pass adds no new products. Add a maximum iteration limit to catch cycles.

known_files = FileIndex (real files on disk)
loop {
    run discover() for all processors, with known_files visible
    new_outputs = outputs declared in this pass that were not in known_files
    if new_outputs is empty → break
    known_files = known_files + new_outputs
}
resolve_dependencies()
execute()

A chain of depth N requires N iterations. Most projects would converge in 1-2 iterations.

Pro: Fully general, handles arbitrary chain depth, no filesystem re-scan, deterministic, path-based matching (no reliance on file extensions)
Con: Processors must be able to discover products for files that do not exist on disk yet (they only know the path). This works for stub-based processors and compilers but might be an issue for processors that inspect file contents during discovery.

D. Explicit Cross-Processor Wiring in Config

Let users declare chains in rsconstruct.toml:

[[pipeline]]
from = "template"
to = "ruff"

rsconstruct then knows that template outputs matching ruff’s scan configuration should become ruff inputs.

Pro: Explicit, no magic, user controls what gets chained
Con: More configuration burden, loses the “convention over configuration” philosophy

E. Make `out/` Visible to FileIndex

The simplest mechanical fix: stop excluding out/ from the FileIndex. Since .gitignore contains /out/, the ignore crate skips it. This could be overridden in the WalkBuilder configuration.

Pro: Minimal code change, works on subsequent builds (files already exist from previous build)
Con: Does not work on the first clean build (files do not exist yet). Processors would also see stale outputs from deleted processors, and stub files from other processors (though extension filtering would exclude most of these).

F. Two-Phase Processor Trait (Declarative Forward Tracing)

Split the ProductDiscovery trait so that each processor can declare what output paths it would produce for a given input path, without performing full discovery:

#![allow(unused)]
fn main() {
trait ProductDiscovery {
    /// Given an input path, return the output paths this processor would
    /// produce. Called even for files that don't exist on disk yet.
    fn would_produce(&self, input_path: &Path) -> Vec<PathBuf>;

    /// Full discovery (as today)
    fn discover(&self, graph: &mut BuildGraph, file_index: &FileIndex) -> Result<()>;
    // ...
}
}

The build system first runs discover() on all processors to get the initial set of products and their outputs. Then, for each declared output, it calls would_produce() on every other processor to trace the chain forward. This repeats transitively until no new outputs are produced. Finally, discover() runs once more with the complete set of known paths (real + virtual).

Unlike Approach C, this does not require a loop over full discovery passes. The chain is traced declaratively by asking each processor “if this file existed, what would you produce from it?” — a lightweight query that does not modify the graph.

Pro: Single discovery pass plus lightweight forward tracing. No loop, no convergence check, no iteration limit. Each processor defines its output naming convention in one place. The full transitive closure of outputs is known before the main discovery runs.
Con: Adds a method to the ProductDiscovery trait that every processor must implement. Some processors have complex output path logic (e.g., cc_single_file changes the extension and directory), so would_produce() must replicate that logic — meaning the output path computation exists in two places (in would_produce() and in discover()). Keeping these in sync is a maintenance risk.

G. Hybrid: Visible `out/` + Fixed-Point Discovery

Combine Approach E (make out/ visible) with Approach C (fixed-point loop) or Approach F (forward tracing). On subsequent builds, existing files in out/ are already in the index. On clean builds, the fixed-point loop discovers them from declared outputs.

Pro: Most robust — works for both clean and incremental builds
Con: Combines complexity of two approaches, risk of discovering stale outputs

Recommendation

Approach C (fixed-point discovery loop) is the most principled solution. It is fully general, handles arbitrary chain depth, requires no configuration, and matches the core insight from Bazel and Buck2: declared outputs should be visible during dependency resolution before execution begins.

The main implementation requirement is extending the FileIndex (or creating a wrapper) to accept “virtual” entries for paths that are declared as outputs but do not yet exist on disk. Processors already declare their outputs during discover(), so the information needed to populate these virtual entries is already available.

Current Status

Cross-processor dependencies are implemented using Approach C (fixed-point discovery loop). After each discovery pass, newly declared outputs are injected as virtual files into the FileIndex. Discovery re-runs with the expanded index until no new products are found (up to 10 iterations).

Key implementation details:

FileIndex::add_virtual_files() inserts declared output paths into the index so downstream processors can discover them via scan().
BuildGraph::add_product() handles re-declarations during multi-pass discovery (see below).
The loop runs in all three discovery sites: the main build graph builder, build_graph_filtered, and the deps builder.
--phases output shows per-pass statistics when multiple passes are needed.
Most projects converge in 1 pass (no cross-processor chains). Projects with generator → checker chains converge in 2 passes.

Deduplication during multi-pass discovery

When processors re-run on subsequent passes, they may try to add products that already exist. add_product() detects this via two separate dedup paths, depending on whether the product declares outputs:

Products with outputs (generators)

Dedup is keyed on output paths. When a product with the same outputs is re-declared by the same processor:

Identical re-declaration — Same inputs. The product is silently skipped.
Expanded inputs — The new inputs are a superset of the existing inputs. This happens when a processor like tags collects all matching files into a single product. On pass 2, virtual files from generator outputs are now in the FileIndex, so tags discovers the same product with additional inputs. The existing product’s inputs are updated to the expanded set, and the input_to_products index is updated accordingly.

Both cases account for instance name remapping: a product may have been remapped from cc_single_file to cc_single_file.clang after pass 1, but discover() still passes the type name cc_single_file on pass 2. The dedup check accepts processor names where one is a qualified instance of the other (e.g., cc_single_file matches cc_single_file.clang).

Genuinely conflicting products — different processors (or the same processor with different inputs that are not a superset) declaring the same output — still produce an Output conflict error.

Products without outputs (checkers, explicit processors with output_dirs)

Products with no declared output files (e.g., checkers, or explicit processors that only declare output_dirs) cannot be deduped by output path. Instead, they are deduped by the tuple (processor_name, primary_input, variant) via the checker_dedup index.

This path also supports expanded inputs. When a later pass re-declares the same product with a superset of inputs, the existing product’s inputs are updated. This is critical for processors like explicit that use input_globs: on pass 0, the globs may match nothing (the target files don’t exist yet); on pass 1, virtual files from upstream generators are available and the globs resolve to additional inputs. Without the input update, the product would be frozen with its pass-0 inputs, no dependency edges would be created to the upstream producers, and the product would execute too early (before its actual inputs exist).

Shared Output Directory

Multiple processors can write into the same directory — a website _site/, a dist/, a build/ folder. This document explains how rsconstruct keeps each processor’s cache correct when they share an output directory, and the exact rules that make it work.

The scenario

A common case:

mkdocs (a Creator) builds a whole site. It produces many files under _site/ and declares the directory as its output_dir. It cannot enumerate individual outputs in advance.
pandoc (a Generator / Explicit) converts one specific markdown file into _site/about.html. It declares that file explicitly as its output_files.

Both contribute to the same directory. A website IS a single folder by design.

[processor.creator.mkdocs]
command   = "mkdocs build --site-dir _site"
output_dirs = ["_site"]

[processor.explicit.pandoc]
command      = "./pandoc-page.sh"
inputs       = ["about.md"]
output_files = ["_site/about.html"]

The problem

Naive implementations break in at least three places:

Over-claiming at cache store time. If mkdocs’s cache entry walks _site/ and records every file, it will wrongly claim about.html as its own. On cache restore, pandoc’s file gets restored from mkdocs’s cache — with whatever content mkdocs last saw there — even if pandoc hasn’t run.
Clobbering at build time. If mkdocs wipes _site/ before running (so stale outputs from a previous build don’t linger), it will also delete pandoc’s about.html whenever mkdocs runs after pandoc.
Clobbering at restore time. If restoring mkdocs’s cache wipes _site/ before writing cached files, it will again destroy pandoc’s output.

Each problem leads to silent cache corruption: stale content appears to be fresh, or recently-built files vanish.

Ownership rule

Every declared output path has exactly one owner — the single product that lists it in outputs, output_files, or produces it as a named product output.

A directory declared as output_dir is not an ownership claim on the whole subtree. The Creator only owns the files it itself produces that no other product has declared.

This is enforced by a single graph query, BuildGraph::path_owner(path) -> Option<usize>, which returns the id of the unique product that declares path as one of its outputs (or None if nobody does).

Pseudocode:

path_owner(path):
    for each product P in graph:
        if path in P.outputs:
            return P.id
    return None

A declared output path has at most one owner by construction — if two products declare the same literal output, that is detected as an output conflict at graph-build time and the build aborts.

How each of the three hazards is handled

1. Over-claiming at cache store time

When a Creator’s tree descriptor is being built in ObjectStore::store_tree_descriptor, the walker visits every file under each output_dir. For each file, it asks the graph: “Is this path owned by a different product?”

is_foreign(path) = graph.path_owner(path) is Some(owner) and owner != my_product_id

If is_foreign(path) is true, the file is skipped — it does not appear as a tree entry. The Creator’s cache then contains only files the Creator actually created and that nobody else has laid claim to.

When pandoc writes _site/about.html and mkdocs later caches _site/, mkdocs’s tree will not contain about.html because path_owner("_site/about.html") == pandoc.id != mkdocs.id.

2. Clobbering at build time

Before a product’s command runs, remove_stale_outputs removes stale outputs so the command can rewrite them fresh (important when a cache restore left read-only hardlinks in place).

The rule for Creators:

Do NOT wipe output_dir wholesale.
Read the previous tree descriptor from the object store.
Remove only the files recorded in that previous tree.
Re-create the output_dir (so the command can assume it exists).
Leave any file not in the previous tree alone — it belongs to somebody else.

Pseudocode:

remove_stale_outputs(product, input_checksum):
    if product has output_dirs:
        previous = object_store.previous_tree_paths(descriptor_key(product, input_checksum))
        for file in previous:
            if file exists: remove it
        for dir in product.output_dirs:
            create dir if missing
    for file in product.outputs:
        if file exists: remove it

Because the previous tree only ever contained paths the Creator owned, this removal cannot touch files owned by other processors.

3. Clobbering at restore time

Cache restore for a tree descriptor iterates entries and writes each one in place. It never calls remove_dir_all on the output_dir. If a file already exists with the correct checksum, the restore skips it (saving I/O).

When mkdocs restores its tree:

_site/index.html and _site/assets/style.css are written from the object store.
_site/about.html is NOT in mkdocs’s tree, so it is neither written nor removed.
If pandoc has also restored, pandoc’s blob descriptor wrote _site/about.html separately.

The two restores compose correctly regardless of order.

Invariants

The system relies on these invariants; each is enforced in code:

#	Invariant	Where enforced
1	Every declared output path has at most one owner.	`add_product` / graph validation (output conflict check)
2	A Creator’s tree descriptor contains only paths not owned by any other product.	`store_tree_descriptor` with `is_foreign` predicate
3	Pre-run cleanup removes only files the Creator previously owned.	`remove_stale_outputs` reads `previous_tree_paths`
4	Cache restore never deletes files it did not cache.	`restore_tree_descriptor` writes in place; no `remove_dir_all`

When all four hold, processors can freely share an output directory.

Worked example

Starting from an empty project, both processors are declared as above and both get to run on a fresh build.

First build

pandoc runs first.
- remove_stale_outputs: pandoc has no output_dirs; removes _site/about.html if it exists (it doesn’t). No-op.
- Runs ./pandoc-page.sh, which creates _site/about.html.
- Caches a blob descriptor for _site/about.html.
mkdocs runs next.
- remove_stale_outputs: mkdocs has output_dirs; looks up its previous tree (none — first build). Creates _site/ to ensure it exists.
- Runs mkdocs build, which writes _site/index.html, _site/assets/style.css, and may (harmlessly) touch _site/about.html.
- Caches a tree descriptor. The walker skips _site/about.html because path_owner says pandoc owns it. Tree = [index.html, assets/style.css].

Final state on disk: index.html, assets/style.css, about.html. All three files exist with correct content.

Incremental build, no changes

pandoc: input checksum matches; descriptor already exists; skipped.
mkdocs: input checksum matches; descriptor already exists; skipped.

Clean outputs + rebuild

rsconstruct clean outputs deletes _site/ entirely.
Next build:
- pandoc’s input checksum matches its cached descriptor → restore blob → writes _site/about.html.
- mkdocs’s input checksum matches its cached descriptor → restore tree → writes only the files in the tree (index.html, assets/style.css), leaves about.html alone.

Final state is the same as after the first build, without either tool having actually run.

Building only the Creator (`-p creator.mkdocs`)

pandoc is not in the run set; _site/about.html stays wherever it was (absent if cleaned, present otherwise).
mkdocs runs or restores its tree.

If _site/ was clean, about.html remains absent — which is correct, because the Creator does not claim to produce it. The regression test creator_tree_does_not_include_foreign_outputs verifies exactly this.

Non-goals

Runtime conflict detection for paths the Creator actually wrote but didn’t declare. If a Creator happens to write a file that another Generator also declares, the declared owner wins; the Creator’s tree simply won’t include that file. We do not error on this.
Ordering constraints. rsconstruct does not enforce “Generators run before Creator” or vice versa. The snapshot/walk is done after each product finishes, and path_owner is a static graph query independent of run order.
Partial-directory caching like git trees with subtrees. The tree descriptor is a flat list of (path, checksum) entries, which is enough for this use case.

Quick reference for processor authors

If you are writing a new processor:

Generator / Explicit: declare every output file in output_files. rsconstruct keeps each of your files safe from Creators that share the directory.
Creator: declare the shared directory in output_dirs. Do NOT assume the directory is empty when your command runs — other processors may have already contributed files to it. Your command should overwrite only what it produces; it should not wipe the directory.
Conflict: never declare the same path as an output in two different products. That is a graph-build-time error regardless of directory sharing.

Processor Ordering

When two processors touch the same files or cooperate on a shared workspace, the question of “which runs first?” inevitably comes up. This chapter explains how rsconstruct answers that question today, how other build systems approach it, the dilemmas that show up in practice, and why rsconstruct has deliberately avoided adding explicit ordering knobs so far.

How rsconstruct orders today

rsconstruct has no explicit cross-processor ordering configuration. Ordering is derived entirely from the data-flow graph:

Each product (a unit of work from a processor) declares inputs and outputs.
If product A’s inputs contains a path that product B’s outputs also contains, A depends on B — B runs first.
Products with no such relationship are considered independent and may run in parallel (within the same topological level).

That’s the whole mechanism. The BuildGraph performs a topological sort on this implicit graph and the executor processes levels in order. See Cross-Processor Dependencies for the data-flow story.

There is no depends_on, mustRunAfter, before, after, priority, or stage field anywhere in rsconstruct.toml. If two processors write into the same directory without any file dep between them, their order is undefined and may vary between runs.

How other tools handle it

Bazel, Buck2

No explicit ordering. Rules declare srcs, deps, and outs. The scheduler orders actions strictly by the DAG of declared inputs/outputs. Hermeticity is a first-class value — if you need something to run before something else, you model it as a data dependency. If a rule B needs rule A’s side effect but not its output, you fabricate a marker file: A outputs a.done, B takes a.done as an input.

Bazel’s design intent: if you need ordering without data flow, you’re modeling the problem wrong. The graph should tell the truth about what depends on what.

Make, Ninja

Data-flow ordering via rules (foo.o: bar.h). Ninja adds order-only dependencies — the || separator in build.ninja. An order-only dep means “run A before B” without “rebuild B when A changes”. This is useful for things like “create out/ before any rule tries to write into it”. It’s the minimum viable ordering primitive: pure ordering, no rebuild semantics.

Gradle

Has explicit ordering primitives, three of them:

dependsOn — real dependency: running B automatically runs A first (even if A would otherwise be skipped).
mustRunAfter — ordering constraint: if both A and B are in the scheduled set, A runs first; but running B does NOT pull A in.
shouldRunAfter — soft ordering hint: honored when possible, may be violated to enable parallelism.

Gradle’s ecosystem (Android, JVM tooling, packaging/signing pipelines) has more real-world “unrelated tasks that still need ordering” cases — e.g., signing must happen after packaging even though they don’t share a file output. The three-level hierarchy lets users pick the right strength.

CMake

add_dependencies(targetA targetB) enforces ordering at the target level, beyond file-level rules. Used mostly for custom targets that don’t produce tracked output files — the bridge when file-based ordering isn’t sufficient.

Cargo, SBT

No explicit cross-crate ordering. Everything flows from [dependencies] / library deps → data flow → topological sort. Same posture as Bazel.

Summary table

Tool	Explicit ordering knobs	Philosophy
Make / Ninja	Order-only deps (`\|\|`)	Bridge when file deps aren’t enough
Bazel, Buck2	None	Hermeticity; all ordering comes from data flow
Cargo, SBT	None	Same as Bazel
Gradle	`dependsOn`, `mustRunAfter`, `shouldRunAfter`	Real-world tasks have non-data ordering needs
CMake	`add_dependencies`	Bridge for “phantom” custom targets
rsconstruct	None (currently)	Same as Bazel

The dilemmas

Adding explicit ordering feels useful but carries real risks. Here are the tradeoffs.

Dilemma 1: does ordering imply rebuild?

Say [processor.b] after = ["a"]. If A’s output changes, should B rebuild?

If yes, after is just dependsOn — which we already have through data flow. It’s redundant.
If no, after is pure ordering (mustRunAfter). But then it silently lies about the true dependency graph: a user might add after = ["a"] because they “know” B consumes A’s side effect, but rsconstruct won’t invalidate B’s cache when A changes. Stale caches follow.

Gradle copes because it has three flavors. Adding one flavor is usually wrong; adding three is complexity creep.

Dilemma 2: declared vs. inferred

rsconstruct already infers ordering from inputs/outputs. Adding another channel means:

Two sources of truth for the dependency graph.
Debugging “why did B run after A?” now requires checking both the data flow AND the explicit config.
Mistakes compound: a user adds after = ["a"] but forgets that they ALSO removed the data dep; now B runs after A but doesn’t actually consume anything from it.

Dilemma 3: encourages side-effects

If ordering knobs exist, they become the path of least resistance for modeling side effects:

“My script also writes to /tmp/cache_seed.json, just declare after and it’ll work.”

Side-effectful processors are an anti-pattern in any incremental build system — the cache can’t know when they changed, when to rerun them, or what invalidates them. Every ordering primitive that doesn’t touch the cache makes side effects easier to introduce.

Dilemma 4: the “fix-up pass” case

The one case where data flow struggles: a processor that runs after everything else has written to a shared directory and modifies the result. Examples:

Minification: take everything in dist/ and minify it after all generators have produced their outputs.
Post-processing: add cache-busting hashes to filenames, rewrite links, compress.

In Bazel, you model this as a rule with srcs = glob(["dist/**"]). But with lazy generators (outputs that didn’t exist when the scan ran), globs can miss things.

Reasonable fixes without adding ordering knobs:

Have the fix-up processor declare its inputs explicitly as the output files of the generators. Works but requires enumeration.
Re-scan globs after each dependency level so the fix-up step sees newly-generated files. Correct, but costlier.
Make the fix-up a Creator with the whole dist/ as its output_dir. Our shared-output-directory logic handles this cleanly (see that chapter), but now the fix-up operates in-place on files owned by others, which touches the “files owned by other products” rule.

None of these is wonderful, but none requires a new ordering primitive.

Dilemma 5: parallelism is already constrained

If ordering becomes a first-class concept, users will sprinkle after = [...] for safety and the scheduler will serialize work that could have run in parallel. Bazel’s aggressive parallelism comes partly from refusing to accept unprincipled ordering constraints.

Why rsconstruct hasn’t added ordering

The posture we’ve picked (for now):

Data flow is the truth. Every time ordering matters, there is a real data dependency. Expose it as an input/output rather than as a separate ordering rule.
Shared output directories are handled without ordering. The Shared Output Directory design lets multiple processors contribute to one folder in any order; the cache stays correct per-processor.
The cost of adding explicit ordering is high: it creates a second channel for dependencies, invites side-effect-oriented thinking, and rarely solves a problem that couldn’t be solved by modeling the data flow properly.

When we would add explicit ordering

If a real use case appears where:

Data flow genuinely cannot express the dependency (no file is consumed, only a side effect).
The alternative (adding a marker file or input_glob re-scan) is significantly worse than adding a knob.
The feature can be specified with clear rebuild semantics (pick one of: forces rerun / does not force rerun; do not leave it ambiguous).

Then the most likely shape is a single after = ["processor_name"] field with Gradle’s mustRunAfter semantics:

Affects ordering only when both processors are already scheduled.
Does NOT add a rebuild trigger.
Does NOT force the referenced processor to run.

This is the smallest, most honest knob. It doesn’t pretend to be a data dependency; it doesn’t change cache invalidation; it only constrains scheduling.

Until that case is concrete, the answer is: model ordering through data flow. The graph should tell the truth.

Alternative: Output Prediction

Another way to close the gap without adding ordering knobs: make opaque Creators (mkdocs, Sphinx, Jekyll) transparent by discovering their outputs in advance.

Instead of the Creator declaring output_dirs = ["_site"] (opaque — “something goes in here”), it would declare (or generate) the exact file list it will produce:

[processor.creator.mkdocs]
command         = "mkdocs build --site-dir _site"
predict_command = "./list-mkdocs-outputs.sh"   # prints one output path per line
output_dirs     = ["_site"]

rsconstruct would run predict_command at graph-build time, turn each printed path into a declared outputs entry, and promote the Creator to a per-file Mass Generator. After that, the entire “how do we order two processors that both write into _site/?” question dissolves — every file has exactly one declared owner, and the normal Generator/data-flow rules apply.

Why this is an alternative to ordering knobs:

Explicit ordering says “we can’t model this; let the user pin the order manually.”
Output prediction says “we can model this if we know the outputs; let’s discover them.”

Prediction is the more principled answer — the graph ends up telling the truth about what depends on what — but it is far more expensive to do well (predictor drift, plugin ecosystems, partial-build support, validation). Ordering knobs are cheap but lie about the dependency graph.

The full tradeoff is explored in the Output Prediction chapter. Short version: neither is obviously better; they solve different problems and could coexist.

Output Prediction & MassGenerator

A Creator (mkdocs, Sphinx, Jekyll, Hugo, etc.) declares output_dirs = ["_site"] — “I produce something in here, don’t ask me what until I’ve run.” This chapter specifies a new processor type, MassGenerator, that makes those tools transparent: the tool is asked in advance what it will produce, and each planned file is promoted to a declared product output.

Once outputs are known up front, per-file caching, precise incremental rebuilds, cross-processor dependencies on generated files, and safe output-conflict detection all come for free.

Status

Designed, not yet implemented. This document is the design spec that guides the implementation.

Related designs:

Shared Output Directory — the fallback mechanism for tools that can’t predict outputs.
Processor Ordering — the sibling design discussion about explicit ordering knobs.

The core idea

Today we treat tools like mkdocs as a black box:

[processor.creator.mkdocs]
command     = "mkdocs build --site-dir _site"
output_dirs = ["_site"]   # opaque — we only know the directory

The new approach asks the tool to emit a manifest before running:

[processor.mass_generator.mkdocs]
command         = "mkdocs build --site-dir _site"
predict_command = "mkdocs-plan"                  # prints a JSON manifest on stdout
output_dirs     = ["_site"]

rsconstruct invokes predict_command at graph-build time, parses its JSON output, and creates one product per planned file. Each product has its own inputs (taken from the manifest’s sources field) and a single outputs entry (the planned path). From that point on, the product is a regular per-file Generator — caching, dependency tracking, and cross-processor wiring all work uniformly.

Manifest format

predict_command must print a single JSON document to stdout in this shape:

{
  "version": 1,
  "outputs": [
    {
      "path": "_site/index.html",
      "sources": ["docs/index.md", "templates/default.html", "mysite.toml"]
    },
    {
      "path": "_site/about/index.html",
      "sources": ["docs/about.md", "templates/default.html", "mysite.toml"]
    },
    {
      "path": "_site/assets/style.css",
      "sources": ["assets/style.scss", "assets/_vars.scss"]
    }
  ]
}

version — integer. Schema version (1 for now). Allows future evolution without breaking existing tools.
outputs — array, one entry per file the tool will produce.
outputs[].path — output file path relative to the project root. Must fall within one of the processor’s output_dirs (enforced).
outputs[].sources — array of input paths whose changes should trigger rebuilding this output. Used as the product’s inputs, which feed into cache-key computation.

Order within outputs must be deterministic (sorted by path). The sources array should be minimal — only the files whose content genuinely affects this specific output.

Lifecycle

1. Plan phase (at graph-build time)

Once per MassGenerator instance declared in rsconstruct.toml:

Run predict_command. Capture stdout and exit status.
Exit status non-zero → fail the graph build with the tool’s stderr in the error message.
Parse stdout as JSON. Malformed → fail the graph build.
Reject manifest if any outputs[].path falls outside the declared output_dirs.
For each manifest entry, add one product to the build graph:
- inputs = entry’s sources
- outputs = [entry’s path]
- processor = this instance’s name
Cache the manifest itself in the object store, keyed on a hash of (config + input_checksum_of(source_tree)). Re-planning is skipped when the hash matches.

The plan phase runs BEFORE the existing product-discovery phase, so predicted outputs are known to all downstream processors (linters, compressors, etc.) via the normal file-index/cross-processor-dependency mechanisms.

2. Build phase

When one or more MassGenerator products are dirty:

rsconstruct groups all dirty products belonging to the same MassGenerator instance into a single execution batch.
It invokes command exactly once per batch (not per product).
The tool produces all its output files in that one invocation.
Each product caches its own output file as a blob descriptor, independently.
In strict mode (default): after the tool exits, rsconstruct verifies that every predicted file in the batch was produced and no unexpected files appeared in output_dirs. A mismatch fails the build.
In loose mode (--loose-manifest CLI flag): divergence is a warning only.

The “one invocation, many products” idiom is this type’s defining execution shape — distinct from both Generator (one invocation per product) and Creator (one invocation, one product).

3. Restore phase

When all MassGenerator products for an instance are cache-clean:

Each product is restored from its blob descriptor independently — no tool invocation at all.
Partial restoration is natural: if 47 of 50 files are clean, only 3 products go through the build phase (which still triggers one tool invocation, but the 47 unchanged files are either untouched on disk or silently overwritten with identical content).

4. Verification (strict mode)

After build:

Every manifest entry → file exists with the right path.
Every file in output_dirs → appears in the manifest OR belongs to another processor (via the existing path_owner query).

Violations are hard errors; partial output is left on disk for debugging.

Graph shape

With a MassGenerator producing N planned files, the graph looks like this:

  source files (markdown, templates, config)
         |
         | (as inputs to each planned file's product)
         v
  [product: _site/index.html]
  [product: _site/about/index.html]
  [product: _site/assets/style.css]
  ... (N products, all with processor = "mass_generator.mkdocs")

Each product is a first-class citizen in the graph. A downstream linter can depend on _site/index.html like any other generated file.

Execution: one tool invocation for many products

Today’s executor assumes “one product = one invocation of processor.execute(product).” MassGenerator violates that. The cleanest implementation (per the design discussion) uses a two-level graph:

Phase product (internal, not user-visible): one synthetic product per MassGenerator instance whose execute is the actual tool invocation. It has no declared outputs; its job is to populate the output_dir.
File products (the N planned files): each depends on the phase product, meaning the tool must have run before any file product can be cached/restored. Each file product’s execute is a no-op (tool already ran); it just caches its output.

The dependency system then naturally orders: phase product runs once (if any file product is dirty), then every dirty file product caches its output. Clean file products skip both phases.

This shape keeps the executor simple and reuses all existing caching, skipping, and restore logic without modification.

Config reference

[processor.mass_generator.<INSTANCE>]

# The tool's build command. Runs once per batch of dirty file products.
command = "mkdocs build --site-dir _site"

# The tool's plan command. Must print the JSON manifest to stdout.
# May be the same binary with a different flag or a separate script.
predict_command = "mkdocs-plan"

# Where the tool will produce its outputs. Every manifest entry's path
# must fall inside one of these directories. Used for verification.
output_dirs = ["_site"]

# Standard scan fields still apply — they bound which source changes
# trigger a replan.
src_dirs = ["docs", "templates"]
src_extensions = [".md", ".html", ".yaml"]

# Optional: skip strict output verification for this instance.
# Useful during development of the tool itself. Default: false.
loose_manifest = false

Interaction with the shared-output-directory design

This new processor type does not replace the Creator / shared-output-directory mechanism. Both coexist:

User declares	Treated as	Caching	Cross-processor deps
`output_dirs` only	Creator (opaque)	One tree per build	Only via declared files
`output_dirs` + `predict_command`	MassGenerator	Per file	Full — all files known

Choose Creator when the tool can’t enumerate its outputs. Choose MassGenerator when it can.

Design invariants (for tool authors)

For a tool to be consumed as a MassGenerator, predict_command must uphold:

Pure function of config + source tree. Same inputs → same manifest, bit for bit.
Cheap or cached. rsconstruct calls this on every graph build. Slow predict_command means slow rsconstruct invocations.
Matches the build command’s actual outputs. Predicted paths = actual paths. Violations are hard errors in strict mode.
Deterministic variable outputs. If the tool produces tag pages or archive pages or anything else content-derived, predict_command must compute them from the same source inspection pass.

The rssite README spells out a concrete contract that meets these invariants.

Advantages

1. Shared-directory ownership becomes trivial

Every generated file has a declared owner at graph-build time. The existing output-conflict check catches overlaps instantly:

Output conflict: _site/about.html is produced by both [mass_generator.mkdocs] and [explicit.pandoc]

The complex path_owner + tree filtering + previous-tree cleanup mechanism (see Shared Output Directory) is still there as a safety net, but for MassGenerators it’s mostly unnecessary.

2. True cross-processor dependencies

Downstream processors (linters, compressors, sitemap builders) can declare the MassGenerator’s outputs as inputs. The graph connects properly. Impossible with opaque Creators.

3. Per-file caching

Change docs/tutorial.md → rebuild only _site/tutorial.html. On a large site this is the difference between “rebuild in 50ms” and “rebuild in 30s.”

Note: the per-file caching on the rsconstruct side only saves the tool invocation when ALL file products are clean. If any one is dirty, the tool runs once and produces everything — then clean files are still cached individually (useful across different invocations). True per-file build speed requires the tool itself to support partial builds. rssite will; most existing tools won’t.

4. Parallel file caching

With per-file products, different files can be cached to the object store in parallel after the build. Minor win, but free.

5. Precise clean, precise restore, real dep graphs

Every downstream feature that relies on declared outputs — clean outputs <path>, graph visualization, dry-run, watch mode — works correctly for MassGenerator outputs without special cases.

Disadvantages

1. Predictor drift

If predict_command lies (or gets out of sync with the tool), the cache can be corrupted silently: predicted paths get restored, actual build produces different paths, orphan files accumulate. Strict-mode verification after each build is the guardrail — it catches drift at build time rather than at next-restore time.

2. Predict-time cost

Every graph build runs predict_command. For large sites this may mean parsing every source file to enumerate outputs. The manifest cache (keyed on source-tree hash) mitigates but doesn’t eliminate this.

3. Partial build support

The per-product caching model wants “rebuild just this one file” but most tools rebuild everything per invocation. With mkdocs, hugo, jekyll, you pay full build cost whenever anything is dirty, regardless of how many files changed. rssite is being designed to support partial builds from day one; existing tools would need patches.

4. Engineering cost

The MassGenerator type is a new processor class with new execution semantics (“one invocation for many products”). That’s real implementation work in the executor, plus a new config schema, plus manifest parsing, plus verification logic.

5. Variable outputs may require heavy parsing

Tag pages, archive indices, RSS feeds — all content-derived. predict_command has to do enough source parsing to enumerate them. For well-designed tools this is cheap (the same parsing feeds both plan and build). For retrofitted tools it’s often duplicate work.

Open questions

These should be resolved during implementation:

Single-pass mode: should we support a --print-manifest flag on command itself, so one invocation does both plan and build? Faster for full rebuilds, slightly uglier config. Probably yes, optional.
Manifest schema evolution: how do we handle version: 2? Support both for a transition period, or hard-require upgrade? Probably both-for-N-releases.
Incremental invalidation: when the manifest changes between builds (e.g., a new page added), how is the old cache cleaned? The existing descriptor-based cache handles this automatically (unreferenced cache entries are eventually pruned), but the behavior deserves explicit documentation.
Interaction with file_index: predicted outputs need to appear in the file index so downstream processors can discover them during their own scan phases. Must be registered before discover_products runs.
Watch mode: when a source file changes, do we re-run predict_command or reuse the last manifest? The hash-based cache mostly handles this, but edge cases around plugin-rewritten outputs need thinking.

Recommendation

Build this once rssite (or any other cooperating tool) is far enough along to drive concrete requirements. Implementing it against a hypothetical tool wastes work — we’d guess at features. Implementing against rssite (where we control both sides) grounds the design in reality.

When implemented, do it in this order:

New processor type mass_generator registered in the plugin registry.
Config schema (predict_command, loose_manifest).
Plan phase: invoke predict_command, parse JSON, create products.
Execution phase: batching logic — one invocation per instance, per build.
Strict verification after build.
Manifest caching (skip re-plan when source tree unchanged).
Documentation in docs/src/processors/mass_generator.md once it’s real.

Per-Processor Statistics

rsconstruct shows several “per-processor” or “per-analyzer” statistics tables (cache stats, analyzers stats, graph stats, build summaries). These all look similar on the surface, but the data source differs, and that changes what we can cheaply show.

This document explains:

The three data sources that feed per-X statistics.
The per-processor grouping problem in cache stats.
Options for fixing it, with tradeoffs.
Secondary cleanup — graph-level helpers.

The three data sources

Question	Lives where	Cost of grouping by X
“How many products does pylint have in this build config?”	graph (in-memory)	free
“How many products were built / skipped / restored this run?”	executor stats (in-memory)	free
“How many files did each analyzer find?”	`.rsconstruct/deps.redb` (on disk, keyed by analyzer)	fast — single DB scan, key is already the analyzer name
“How big is my on-disk cache, per processor?”	`.rsconstruct/cache/descriptors/` (on disk)	see below — this is the problem

Graph (in-memory, rebuilt each run)

Every Product carries its processor: String field. Grouping is a simple iteration over Vec<Product>, constructing a HashMap<String, T> on the spot. Every caller that wants per-processor stats does this inline — see builder/graph.rs:111, builder/build.rs:323,436,467, executor/execution.rs:180,479,524,540.

Analyzer dependency cache (`deps.redb`)

The redb schema stores each entry keyed by (source path → dependencies) and tagged with the analyzer that produced it. DepsCache::stats_by_analyzer() scans the DB once and returns HashMap<analyzer, (file_count, dep_count)>. Grouping is effectively free because the analyzer name is a first-class field.

Object-store descriptors (`.rsconstruct/cache/descriptors/`)

Each descriptor file is a small JSON blob describing one cached product — its outputs, their checksums, etc. The filename is a hash of the product’s cache key; the file’s location tells us nothing about which processor created it.

Today’s code in object_store/management.rs:169:

#![allow(unused)]
fn main() {
pub fn stats_by_processor(&self) -> BTreeMap<String, ProcessorCacheStats> {
    // walk every file in descriptors_dir
    //   read the file
    //   parse the JSON
    //   ...
    //   "We can't extract processor name from a hashed descriptor key.
    //    Use 'all' as a single bucket for now."
    let processor = "all".to_string();
}
}

Two things are wrong with this:

It’s a white lie. The function is named stats_by_processor, but it returns a single "all" bucket. There is no per-processor grouping.
It’s slow. Even to produce that single bucket, it reads and parses every descriptor file. For 10,000 cached products that’s 10,000 syscalls and 10,000 JSON parses, just to count entries.

Why this matters: declared-but-empty processors

In analyzers stats, if a user declares [analyzer.cpp] in rsconstruct.toml but the analyzer never matches anything, the table shows a cpp 0 0 row (implemented 2026-04-12). This is a useful signal: “you configured it, but it is silently doing nothing.”

We’d like the same in cache stats: show every enabled processor, including those with zero cached entries, so that users notice mis-configurations.

We cannot implement this today. If we listed declared processors with zeros, real entries would still be lumped into "all", so the table would show:

all:    50 entries, 58 outputs, 3.2 MiB
ruff:    0 entries, 0 outputs, 0 bytes      ← misleading
pylint:  0 entries, 0 outputs, 0 bytes      ← misleading
Total:  50 entries, 58 outputs, 3.2 MiB

That’s worse than the current output — it tells the user “pylint produced nothing” when pylint may actually have plenty. Fixing the 0-rows UX requires first fixing the grouping itself.

Options to fix per-processor cache grouping

Option A — embed the processor name inside each descriptor

Add a processor: String field to CacheDescriptor. The cache-insert path populates it (already known at that point). stats_by_processor reads the field instead of hard-coding "all".

✅ Small, localized change — ~100–150 lines including a backward-compat fallback for old descriptors.
❌ Does not fix the slowness. We still read and parse every descriptor to learn the grouping.
❌ Cache format change requires either a migration step, a “legacy entries show up as unknown” fallback, or a cache wipe on upgrade.

Option B — encode the processor name in the descriptor’s path

Layout changes from:

.rsconstruct/cache/descriptors/
    ab/
        cd/
            abcd1234…json

to:

.rsconstruct/cache/descriptors/
    ruff/
        abcd.json
        ef01.json
    pylint/
        9876.json

stats_by_processor becomes:

#![allow(unused)]
fn main() {
for each subdir of descriptors/:
    name = subdir.file_name()       // free — already a String in the dir entry
    count = number of files in subdir  // one readdir per processor
}

✅ Fixes grouping and speed simultaneously. 30 readdirs instead of 10,000 reads is two to three orders of magnitude faster.
✅ Trivially answers “does this processor have any cached entries at all?” with exists(descriptors/NAME/).
❌ Changes on-disk cache layout. Requires migration.

Since descriptors are a cache by definition (regenerable from a build), the simplest migration is: detect the old layout on startup and wipe it. Next build repopulates under the new layout. No data loss beyond a slower first build post-upgrade.

Option C — maintain a processor→count index in a redb sidecar

Keep a small redb database (e.g. .rsconstruct/cache/stats.redb) with a table mapping processor_name → (entry_count, output_count, output_bytes). The cache insert / evict paths update this index transactionally alongside the descriptor write.

stats_by_processor becomes:

#![allow(unused)]
fn main() {
let db = redb::Database::open("cache/stats.redb")?;
let table = db.begin_read()?.open_table(STATS_TABLE)?;
// One DB read per processor — counts are pre-aggregated.
}

✅ Answers cache stats in O(P) where P = number of processors, independent of cache size. Even faster than Option B at scale.
✅ No on-disk layout change to the descriptors themselves — the sidecar sits alongside the existing directory structure.
✅ Bytes / output counts are maintained eagerly, so the “bytes” axis is also free (unlike Option B, which still needs to stat each blob for bytes).
❌ Two sources of truth. If the sidecar and the descriptor directory ever disagree (crash mid-write, manual rm of a descriptor, remote-cache sync, a bug in an insert path), the UI lies. Requires either transactional atomicity across two stores (hard — redb transaction + filesystem write) or a periodic reconciliation pass.
❌ Every cache-insert path needs to update the sidecar. Miss one, and the counts drift silently. Options B and A put the source-of-truth physically next to the cache entry, so there’s no drift to manage.
❌ Cache invalidation logic gets more complex: evicting a descriptor now means “delete the file AND decrement the counter AND handle the decrement failing.” More moving parts, more places for bugs.
❌ Doesn’t help with any future “list all entries for processor X” query — you’d still need Option B’s path layout for that, or fall back to a full walk.

Verdict: Option C is the fastest for this one specific query, but it pays for it with a consistency problem that didn’t exist before. Options A and B keep the cache self-describing — the descriptor itself (or its path) IS the fact — so they’re immune to drift.

Option comparison

Aspect	A (field in descriptor)	B (processor in path)	C (redb sidecar)
Grouping correctness	yes	yes	yes (if kept in sync)
Scan cost	O(N) reads	O(P) readdirs	O(P) DB reads
Bytes count free	no	no (still stat blobs)	yes (pre-aggregated)
On-disk layout change	descriptor format	directory layout	new sidecar file
Source of truth	descriptor	descriptor path	two stores
Drift risk	none	none	real — needs reconciliation
Migration cost	wipe or dual-read	wipe	initial scan to populate
Code complexity	low	low	medium-high
Helps other queries	no	yes (list-by-processor)	no

Recommendation

Option B. The extra invasiveness is one-time (migration). The speed and correctness wins are permanent; the path layout is self-describing, so no drift risk; and it also unlocks fast “list entries for processor X” queries that Options A and C don’t.

Option C is attractive if the only query we cared about was a single summary, but the sidecar’s consistency burden is real and tends to surface as bugs in edge cases (remote-cache sync, partial writes, manual cleanup).

On Option B’s “cost”

The only new artifact on disk is N extra directory entries at the top level of descriptors/, where N is the number of distinct processors that have ever cached anything. In practice that’s 10–30 directories. Filesystems handle that trivially — both ext4 and btrfs are fine with thousands of top-level entries, let alone tens.

In return we get:

stats_by_processor in O(N readdirs) instead of O(cache_size reads).
Honest “declared-but-empty” rows in cache stats (empty dir = 0 entries, and there is no drift to reconcile).
Fast “list cache entries for processor X” — a single readdir.
A self-describing cache: ls .rsconstruct/cache/descriptors/ tells you at a glance which processors have cached anything.

The cost is negligible; the payoff is across the board.

Implementation plan (Option B)

Cache insert path. Change the descriptor write to descriptors/<processor>/<hash>.json (replacing the current descriptors/<hash-prefix>/<hash-suffix>/<hash>.json sharding). The processor name is already known at insert time — it’s on the product.
Cache read path. Descriptor lookups happen by cache key. If the lookup caller already has the processor name, read directly. Otherwise scan the processor subdirs (rare path — most lookups come from a build graph where the processor is known).
stats_by_processor rewrite. Iterate subdirs of descriptors/; each subdir name is a processor. Count files within. For the “bytes” axis, continue to stat the corresponding blob objects.
Migration. On startup, if old-layout descriptor files exist (files directly under sharded ab/cd/ subdirs, or anywhere that isn’t a recognized processor name), wipe descriptors/. Cache is regenerable by definition; next build repopulates under the new layout. Users pay one slower build post-upgrade, no data loss.
cache stats UX. Once grouping is real, enumerate declared processors from rsconstruct.toml and union them with processors present in descriptors/. Show a 0-row for anything declared-but-empty (mirrors the analyzers stats treatment already implemented in builder/analyzers.rs).

Scope

Most of the work lives in src/object_store/:

management.rs — stats_by_processor rewrite.
The insert/read paths (split across object_store.rs and neighbors) — path-construction change.
The cache-clean / trim paths — updated to walk the new layout.

Followed by a small change in src/main.rs (CacheAction::Stats) to consume the new grouped output and render a table with the declared-union treatment.

Estimated: a couple hundred lines, concentrated in a single module.

Secondary cleanup — graph-level helpers

Every caller that wants per-processor grouping over the current graph currently writes the same HashMap pattern inline:

#![allow(unused)]
fn main() {
let mut per_processor: HashMap<&str, _> = HashMap::new();
for product in graph.products() {
    per_processor.entry(&product.processor).or_default() += ...;
}
}

We could add BuildGraph::products_by_processor() -> &HashMap<String, Vec<ProductId>> as a lazily-computed cached view (computed on first access, invalidated only when the graph is mutated).

Benefit: de-duplicates the pattern in ~5 call sites.
Cost: caching / invalidation logic.
Priority: low. The inline grouping is O(N) over RAM iteration and is not a performance bottleneck.

Don’t do this unless a sixth call site shows up.

Current state (2026-04-12)

analyzers stats: fixed. Shows declared-but-empty rows. Separator between data and Total.
cache stats: unchanged. Still uses single-bucket "all" grouping. Documented as a known limitation here; fix is pending Option B.
Graph helpers: not added. Inline pattern remains across call sites.

Profiling

This chapter records concrete profiling runs on rsconstruct, with methodology and findings pinned to a specific version. Add new runs as new sections with date + version headers so historical data stays intact.

How to profile locally

Build a profile-friendly binary

The default release profile strips symbols, so stack traces come out as raw addresses. Cargo.toml defines a profiling profile that inherits release but keeps full debug info:

[profile.profiling]
inherits = "release"
strip = false
debug = true

Build with:

cargo build --profile profiling
# binary lands in target/profiling/rsconstruct

Prerequisite: relax `perf_event_paranoid`

Kernel sampling (perf, samply) requires kernel.perf_event_paranoid <= 1. On a personal dev machine, persist it:

echo 'kernel.perf_event_paranoid = 1' | sudo tee /etc/sysctl.d/60-perf.conf
sudo sysctl --system

Record with `perf` (text-pipeline-friendly)

On CPUs without LBR (most laptops), DWARF unwinding is very slow to post-process — don’t use --call-graph dwarf unless you’re patient. Without a call graph you still get reliable self-time attribution:

perf record -F 999 -o /tmp/rsc.perf.data -- \
    target/profiling/rsconstruct --quiet --color=never status

perf report -i /tmp/rsc.perf.data --stdio --no-children \
    --sort symbol --percent-limit 0.1

Alternative: `samply` (Firefox-Profiler UI)

cargo install samply
samply record -r 4000 -o /tmp/rsc.json.gz -- \
    target/profiling/rsconstruct --quiet --color=never status

Default behavior opens a local UI. Use --save-only to just write the file.

Hardware counters

perf stat -d -- target/profiling/rsconstruct --quiet --color=never status

Gives IPC, cache miss rates, branch miss rates — useful for “is this CPU-bound, memory-bound, or branch-mispredict-bound.”

Run: 2026-04-12 — rsconstruct 0.8.1 — `status` on `teaching-slides`

Target

Command: rsconstruct --quiet --color=never status
Project: ../teaching-slides (10,027 products across 10 processors).
Product breakdown: explicit (1), ipdfunite (55), markdownlint (824), marp (824), ruff (19), script.check_md (824), script.check_svg (3327), svglint (3327), tera (2), zspell (824).

Methodology

Binary: target/profiling/rsconstruct (release + debug info).
Sampler: perf record -F 999 (no call-graph — LBR unavailable, DWARF too slow to post-process on this host).
Counters: perf stat -d.

Wall-clock and counters

Metric	Value
Wall time	1.08 s
User time	0.99 s
System time	0.08 s
CPU utilization	98.7 % of 1 core
RSS peak	28 MB
Instructions	21.10 B
Cycles	5.30 B
IPC	3.98 (very high)
Frontend stall	12.8 %
Branches	5.11 B
Branch miss rate	0.60 %
L1-dcache loads	7.03 B
L1-dcache miss rate	4.13 %

Interpretation: high IPC, low miss rates, low branch mispredictions. The CPU pipeline is fully utilized — slowness comes from doing too many instructions, not from cache thrash or branch mispredicts.

Hot spots (self-time)

% of CPU	Function
48.79 %	`std::path::Components::parse_next_component_back`
12.90 %	`<std::path::Components as DoubleEndedIterator>::next_back`
10.84 %	`rsconstruct::graph::BuildGraph::add_product_with_variant`
8.43 %	`<std::path::Components as PartialEq>::eq`
1.41 %	`__memcmp_evex_movbe`
1.04 %	`core::str::converts::from_utf8`
0.89 %	`_int_malloc`
0.78 %	`std::fs::DirEntry::file_type`
0.61 %	`<std::path::Path as Hash>::hash`
0.60 %	`<std::path::Components as Iterator>::next`
0.38 %	`std::sys::fs::metadata`
0.38 %	`<sip::Hasher as Hasher>::write`
0.37 %	`sha2::sha256::x86::digest_blocks`
0.34 %	`<core::str::lossy::Utf8Chunks as Iterator>::next`
0.31 %	`_int_realloc`
0.29 %	`_int_free_chunk`
0.19 %	`rsconstruct::graph::Product::cache_key`
0.19 %	`std::path::compare_components`
0.19 %	`serde_json::read::SliceRead::parse_str`
0.19 %	`statx`
0.19 %	`malloc`
0.19 %	`cfree`
0.18 %	`core::hash::BuildHasher::hash_one`
rest	scattered < 0.15 % each

Findings

~70 % of CPU is in PathBuf iteration / comparison. Specifically parse_next_component_back + next_back + Components::eq, all invoked from PathBuf equality and hashing. Filesystem I/O (readdir, stat, open) is under 2 %. Hashing (SHA-256 + SipHash) is under 1 %.

The callsite is BuildGraph::add_product_with_variant in src/graph.rs (lines 221–307). It contains three loops whose path-equality cost dominates the whole run:

Lines 232–242 — checker dedup loop. For every checker product (outputs empty), scans every existing product and compares existing.inputs[0] == inputs[0] (full PathBuf equality, which iterates components). With 7,000+ checker products in teaching-slides (script.check_md + script.check_svg + svglint + markdownlint + zspell), this is an O(P²) pass per processor over the course of discovery.
Lines 252–253 — superset check for generator re-declarations. Includes existing.inputs.iter().all(|i| inputs.contains(i)) — an O(M²) call, again per-insertion, again comparing PathBufs component-by-component.
Lines 246–285 — output conflict check. Fast path (HashMap lookup); not the bottleneck.

Graph mutation itself (add_product_with_variant self-time, 10.84 %) is modest. The quadratic scans inside it are where the time goes — they just happen to be attributed to the stdlib path-iteration functions.

Suggested fix (not yet implemented)

Index the checker-dedup and generator-superset lookups via a HashMap keyed on (processor, primary_input, variant) so the linear scans become O(1). For 10,027 products, the expected improvement is ~3×–5× on status wall time.

Scope: additions to BuildGraph (two new HashMap index fields, kept in sync with add_product_*), a small change to add_product_with_variant to do HashMap lookups instead of linear scans. No cache-layout or on-disk-format changes.

Raw data

/tmp/rsc.perf.data was recorded and analyzed to produce the tables above. Removed afterwards — regenerate via the methodology section if needed.

Run: 2026-04-12 (later) — HEAD after HashMap dedup fix

Wall-clock and counters

Metric	Value	vs. 0.8.1 tag
Wall time	0.265 s	4.1× faster
Instructions	2.05 B	-90 %
Cycles	0.88 B	-83 %
IPC	2.34	was 3.98
L1-dcache miss rate	1.34 %	was 4.13 %

The quadratic path-equality peak is gone. What remains is the normal cost of using PathBuf as HashMap keys.

Hot spots (self-time, user-space, 9,948 samples, 10 iterations)

%	Function	Category
5.42	`core::str::converts::from_utf8`	UTF-8 validation
3.52	`sip::Hasher::write`	HashMap hashing
3.51	`<Path as Hash>::hash`	HashMap hashing
3.39	`sha2::sha256::digest_blocks`	Checksumming
2.09	`Components::next`	Path iteration
2.00	`_int_malloc`	Allocator
1.92	`parse_next_component_back`	Path iteration
1.60	`compare_components`	Path comparison
1.19	`combined_input_checksum`	Checksumming
1.12	`Product::cache_key`	Cache keys

Run: 2026-04-12 (later still) — HEAD after path interning

Context

BuildGraph’s three hot HashMaps (output_to_product, input_to_products, checker_dedup) switched from PathBuf keys to a private PathId(u32) backed by an in-memory PathInterner. See Path Interning for design.

Wall-clock and counters

Metric	Value	vs. previous
Wall time	0.245 s	-8 %
Instructions	2.04 B	~flat
Cycles	0.91 B	~flat

Hot spots (self-time, user-space, 9,925 samples, 10 iterations)

%	Function	Notes
4.34	`core::str::converts::from_utf8`	unchanged
3.21	`sha2::sha256::digest_blocks`	unchanged
2.60	`Components::next`	unchanged
2.39	`sip::Hasher::write`	down from 3.52 %
2.25	`<Path as Hash>::hash`	down from 3.51 %
2.06	`_int_malloc`	unchanged
1.52	`resolve_dependencies`	new — attribution shift
1.19	`compare_components`	down from 1.60 %
1.09	`combined_input_checksum`	unchanged

Interning paid off exactly where predicted — the hashing/compare columns dropped, and resolve_dependencies appears because its inner loop is now small enough to self-attribute rather than vanish inside the stdlib path functions. The total gain is modest (~8 %) because after the HashMap dedup fix, HashMap key cost was only ~7 % of total, and interning cuts that in half.

Candidate next targets (not yet implemented)

UTF-8 validation (~6 %) — from display().to_string() in cache-key building. Cache the string form per product or build keys from raw bytes.
Product::cache_key + hex encoding (~2 % combined) — precompute and memoize per product.
SHA-256 (~3 %) — already hardware-accelerated; the only lever is fewer calls, via memoized input_checksum or better batch reuse.

Path Interning

Interning is a data-structure optimization that replaces PathBuf HashMap keys with small integer IDs. It exists to cut the cost of hashing, comparing, and cloning paths during graph construction.

Motivation

The Profiling run on teaching-slides (10,027 products) pointed at three quadratic scans inside BuildGraph::add_product_with_variant. Replacing those scans with HashMap<PathBuf, _> indexes took status from 1.08 s to 0.26 s.

The remaining 0.26 s is dominated, by category:

Category	% of CPU
Path iteration (`Components`)	~10 %
HashMap hashing (SipHash + Path)	~7 %
Allocator churn (malloc/free)	~6 %
UTF-8 validation/decoding	~7 %
Checksumming (SHA-256 + keys)	~6 %

A lot of that is the cost of using PathBuf as a HashMap key. Every insert and lookup does:

Hash the path — walks every component, hashes each byte. O(path length).
On collision, compare paths — walks both paths component-by-component.
Clone the path to store as key — PathBuf allocation + copy.

With ~10,000 products participating in multiple maps (output_to_product, input_to_products, checker_dedup), this work dominates what remains.

The idea

Assign each unique path a u32 ID once, then use the ID everywhere the path is used as a HashMap key or for comparison. Hashing a u32 is one instruction. Comparing two u32s is one instruction. No allocation.

#![allow(unused)]
fn main() {
#[derive(Copy, Clone, Eq, PartialEq, Hash)]
pub struct PathId(u32);

pub struct PathInterner {
    to_id: HashMap<PathBuf, u32>,   // used during insertion
    from_id: Vec<Arc<PathBuf>>,     // id -> path (for display / FS ops)
}

impl PathInterner {
    pub fn intern(&mut self, p: &Path) -> PathId { /* ... */ }
    pub fn get(&self, id: PathId) -> &Path { /* ... */ }
}
}

Every hot HashMap that currently keys on PathBuf switches to PathId.

In-memory only

Interned IDs are per-process. They are assigned fresh at the start of every rsconstruct invocation and dropped when the process exits. They never touch disk.

Data	Lives in	IDs used?
`BuildGraph` HashMaps	RAM, this process	Yes
On-disk cache (redb descriptors, etc.)	Disk, persistent	No
Config files, discovered files	Disk	No

The path foo/bar.md might be PathId(42) today and PathId(17) tomorrow. That is fine because nothing persistent ever referred to 42.

The boundary rule: PathId must not leak into anything persistent. Specifically:

Cache keys on disk (Product::cache_key, descriptor_key) must keep using real paths or content checksums.
Logs and error messages must print real paths, not IDs.
Nothing serializes the interner state.

Why it helps here

Paths are reused heavily. One .md file feeds markdownlint, zspell, script.check_md, marp. Interning collapses four HashMap key clones into one.
The same path appears as a lookup key in every dedup map during graph construction. Each lookup becomes hash(u32) + compare(u32) instead of walking a path’s components.
Product inputs/outputs can still be stored as PathBuf publicly — the optimization targets the HashMap keys, not the product data itself. This keeps the refactor’s blast radius small.

Scope of the change

Narrow scope — only the three hot HashMaps in BuildGraph:

output_to_product: HashMap<PathBuf, usize> → HashMap<PathId, usize>
input_to_products: HashMap<PathBuf, Vec<usize>> → HashMap<PathId, Vec<usize>>
checker_dedup: HashMap<(String, PathBuf, Option<String>), usize> → HashMap<(String, PathId, Option<String>), usize>

The interner lives on BuildGraph. Callers still pass PathBuf/&Path to add_product* — the interner is a private implementation detail. Public access to Product.inputs/outputs/output_dirs remains unchanged.

Non-goals

No on-disk format change. Cache entries keep using real paths.
No API change to Product. Inputs and outputs stay as Vec<PathBuf>.
No plugin-facing change. Lua processors keep seeing paths.

Risks

The interner’s own to_id map still hashes a PathBuf once per unique path. Unavoidable — this is the cost of asking “have I seen this path before?”
Every call site that hashes a &Path into a BuildGraph map now calls interner.intern() or interner.get_id(). Must be careful not to call intern() (mutating) on read-only paths, or lookups may create spurious entries.

Unreferenced Files

Purpose

Find files on disk that are not referenced by any product in the build graph. This helps identify forgotten assets, stale files, or files accidentally excluded from the build configuration.

How It Works

When rsconstruct builds its graph, every product has an inputs list. This list contains all files the product depends on:

Primary inputs — the source files being processed (e.g. foo.svg that mermaid converts to a PNG)
Dependency inputs — files that affect the output but are not the primary source (e.g. a C header file utils.h that main.c includes, a config file like .ruff.toml, or a script passed via dep_inputs)

A file is unreferenced if it does not appear in the inputs list of any product in the graph — neither as a primary input nor as a dependency input.

Why both primary and dependency inputs?

Consider a C header file utils.h. It is not a primary input (the compiler does not produce output directly from it), but it appears in dep_inputs because changes to it must trigger a rebuild of any .c file that includes it. Such a file is clearly referenced and should not be reported as unreferenced.

Only files that appear in no product’s inputs list — not primary, not dependency — are reported.

Usage

rsconstruct graph unreferenced --extensions .svg[,.png,...] [--rm]

Options

Option	Description
`--extensions`	Comma-separated list of file extensions to check (required)
`--rm`	Delete the unreferenced files immediately (no confirmation)

Examples

Find unreferenced SVG files:

rsconstruct graph unreferenced --extensions .svg

Find unreferenced images of any type:

rsconstruct graph unreferenced --extensions .svg,.png,.jpg

Delete unreferenced SVG files:

rsconstruct graph unreferenced --extensions .svg --rm

Output

Plain list of file paths, one per line, relative to the project root:

assets/old_diagram.svg
docs/unused_figure.svg
scratch/test.svg

Design Notes

Extensions are required — defaulting to all files would produce excessive noise (READMEs, Makefiles, config files, etc. are intentionally not in the graph).
Finding unreferenced files does not mean they are useless. The user decides what to do. Common reasons a file might be unreferenced:
- It was part of a processor whose src_dirs or src_extensions excludes it
- It was intentionally left out of the build
- It is a leftover from a renamed or deleted processor instance
- It is a scratch/draft file
--rm deletes without confirmation. Use with care.
The command requires a rsconstruct.toml (the graph must be buildable).

Distributed Execution

This document explores what distributed execution would mean for RSConstruct — the problems it solves, the problems it creates, how other build tools approach it, and what a design might look like.

What distributed execution means

Today RSConstruct runs all products on the local machine, optionally in parallel across multiple cores (-j). Distributed execution means offloading individual products to remote workers — other machines on a network — so that the build exploits more CPU than any single machine has.

This is distinct from remote caching (which RSConstruct already has). Remote caching avoids re-running a product whose result was already computed by someone else. Distributed execution runs products remotely even when no cached result exists. The two features compose: a distributed build that also has remote caching can share results across runs and across users.

The problems it solves

Slow builds on large codebases. When thousands of C files need checking or hundreds of PDFs need rendering, a single machine is the bottleneck even with -j. A cluster of workers can run all of them truly in parallel.
CI latency. CI machines are often single-core or have limited parallelism. Distributing work across a pool of CI agents cuts wall-clock time.
Memory pressure. Some tools (Chromium, LibreOffice, heavy linters) are memory-hungry. Spreading them across machines avoids OOM conditions.

The problems it creates

Input availability

Every product needs its inputs on the worker. For a checker that reads a single source file, this means uploading that file to the worker (or having it available via a shared filesystem). For a generator with many dep_inputs, it may mean uploading dozens of files. This is a non-trivial data transfer problem.

The content-addressed object store already solves this at the output side — outputs are stored by SHA-256. The same mechanism can serve inputs: if the worker has a local object store, the coordinator only needs to send checksums, and the worker fetches missing objects from the remote cache. Products whose inputs are already cached require zero transfer.

Output collection

After execution, the worker’s outputs must be pushed back to the coordinator (or directly to the remote cache) so local build phases and downstream products can use them. This is essentially the existing remote cache push path.

Hermeticity

Distributed workers only produce correct results if builds are hermetic — the product’s output depends only on its declared inputs, not on ambient machine state (installed tools, environment variables, filesystem layout). RSConstruct does not enforce hermeticity today. A worker with a different version of ruff or cppcheck than the local machine will produce different results.

This is the hardest problem. Options:

Ignore it — document that workers must have identical tool versions; use tool locking (rsconstruct tools lock) to detect divergence.
Containers — run each product in a container image that includes all required tools. Bazel and BuildBuddy do this. Heavy but correct.
Nix/flakes — pin tools via Nix derivations on all workers. Correct but requires Nix infrastructure.

Scheduling and load balancing

Which products go to which worker? A central coordinator must:

Know the graph (dependency order).
Dispatch products whose dependencies are already satisfied.
Avoid overloading any single worker.
Handle worker failure (retry on another worker).

This is a distributed systems problem. Even a simple greedy scheduler requires a reliable heartbeat, a work queue, and failure detection.

Latency overhead

For fast products (a Python lint check on a 50-line file takes ~50ms), the overhead of serializing inputs, sending them over the network, waiting for the worker, and receiving results can exceed the actual execution time. Distributed execution only pays off for products that take seconds or more, or when there are so many products that local parallelism is saturated.

How other tools do it

Bazel (Remote Execution API)

Bazel defines the Remote Execution API (REAPI), a gRPC protocol for distributed execution. Workers implement the Execution service; the coordinator submits Action objects (a command + input digest tree). Workers fetch inputs from a Content Addressable Storage (CAS) service, execute the action, and push outputs back to CAS.

Strengths: hermetic by design (actions are pure functions of their inputs), well-specified protocol, many implementations (BuildBuddy, EngFlow, NativeLink, self-hosted buildfarm).

Weaknesses: requires all actions to be declared with precise input sets; dynamic dependencies (header includes discovered at compile time) need special handling; heavy infrastructure to stand up.

RSConstruct’s object store is conceptually similar to CAS. The Product struct already declares all inputs explicitly. Implementing REAPI would make RSConstruct compatible with the existing Bazel remote execution ecosystem without building a proprietary scheduler.

distcc

distcc distributes C/C++ compilation by intercepting gcc/clang invocations and forwarding the preprocessed source to a pool of workers. It works at the invocation level, not the build graph level — the local machine still runs the build tool (make/ninja) and distcc is transparent to it.

Strengths: simple, no build tool integration required, widely deployed.

Weaknesses: only works for compilation (not linters, generators, etc.); requires preprocessing locally (partial hermeticity); no caching.

Incredibuild / Xtensa

Commercial tools that intercept process spawning at the OS level (Windows job objects, Linux LD_PRELOAD) to virtualize and distribute arbitrary commands. No build tool integration required; any tool that runs a subprocess can be distributed.

Strengths: transparent to the build tool; works with any compiler or tool.

Weaknesses: proprietary; expensive; the OS-level interception is fragile.

Pants / Buck2

Both use a daemon-based architecture with a local scheduler that knows the full build graph. Distributed execution is an extension of local execution — the scheduler dispatches actions to remote workers using REAPI or a proprietary protocol. Input digests and output digests flow through a central CAS.

Pants calls this “remote execution”; Buck2 calls it “remote actions”. Both require the build rules to declare all inputs precisely (no dynamic deps).

Ninja + a distributed wrapper

Some teams wrap Ninja with distributed backends (ninja-build + icecc, ninja + sccache, or ninja + a custom scheduler). The wrapper intercepts compiler invocations from the Ninja process. This is similar to the distcc approach but can handle caching (sccache) alongside distribution.

A possible design for RSConstruct

A minimal distributed execution design that fits RSConstruct’s architecture:

1. Worker protocol

Workers expose a simple HTTP API:

POST /execute
  body: { product_id, command, args, input_checksums: {path: sha256, ...} }
  response: { exit_code, stdout, stderr, output_checksums: {path: sha256, ...} }

Before executing, the worker fetches any inputs it doesn’t already have from the shared remote cache. After executing, it pushes outputs to the remote cache and returns their checksums.

2. Input availability via shared cache

The coordinator (local RSConstruct) ensures all inputs are in the remote cache before dispatching a product to a worker. For source files, this means uploading them once at build start. For intermediate outputs (products that are inputs to other products), they flow through the cache automatically — the producer pushes to remote, the consumer fetches from remote.

This avoids a separate “input upload” step for most products: source files are small and stable; once uploaded they stay cached across builds.

3. Coordinator changes

The executor’s product dispatch loop currently runs products locally. With distributed execution:

Each dispatchable product is classified as local or remote based on a configurable predicate (e.g., processor type, estimated duration, worker availability).
Remote products are submitted to a work queue.
A pool of worker connections consumes the queue, tracking in-flight products.
When a remote product completes, its outputs are pulled from cache and the downstream products are unblocked.

The dependency graph and topological sort are unchanged — distribution is purely an execution-layer concern.

4. Hermeticity via tool locking

Without containers, workers must have the same tool versions as the local machine. rsconstruct tools lock already records tool version hashes. Distributed execution should verify that each worker’s tool hashes match the lock file before accepting products of that type. A worker with a mismatched ruff version refuses ruff products and logs a warning.

5. What stays local

Some products cannot or should not be distributed:

Products with cache = false (always-rebuild, e.g., timestamp generators).
Products that depend on the local filesystem state beyond declared inputs (e.g., git log style operations).
Creators that manage local directories (npm install, cargo build) — their outputs are directory trees, not files, and their side effects are local.
Products faster than the round-trip overhead (most lint checks on small files).

A distributed = false config field (analogous to enabled) would let users pin specific processors to local execution.

Current status

Not implemented. RSConstruct runs all products locally. Remote caching (push/pull of outputs) is the only cross-machine feature today.

The design above is a sketch for future consideration. The most natural first step would be implementing a minimal REAPI-compatible worker, since that would make RSConstruct interoperable with existing distributed build infrastructure (BuildBuddy, EngFlow, self-hosted buildfarm) without requiring RSConstruct- specific worker deployments.

Internal Processors

Processors that can be reimplemented in pure Rust, eliminating external tool dependencies. Internal processors are faster (no subprocess overhead), require no installation, and work on any platform with rsconstruct.

The naming convention is to prefix with i (for internal), e.g., ipdfunite replaces pdfunite. Both the original and internal variants coexist — users choose which to use.

Implemented

ipdfunite

Replaces: pdfunite (external pdfunite binary from poppler-utils)

Merges PDFs from subdirectories into course bundles using lopdf in-process. Same config as pdfunite minus the pdfunite_bin field. Batch-capable.

Crate: lopdf

Candidates

ijq / ijsonlint — JSON validation

Replaces: jq (checks JSON parses) and jsonlint (Python JSON linter)

Both tools ultimately just validate that files are well-formed JSON. serde_json is already a dependency — parse each file and report errors.

Crate: serde_json (already in deps) Complexity: Low — parse file, report error with line/column

iyamllint — YAML validation

Replaces: yamllint (Python YAML linter)

Validate that YAML files parse correctly. yamllint also checks style rules (line length, indentation, etc.) which would need to be reimplemented if desired, but basic validity checking is trivial.

Crate: serde_yaml Complexity: Low for validation only, medium if style rules are needed

itaplo — TOML validation

Replaces: taplo (TOML formatter/linter)

Validate that TOML files parse correctly. The toml crate is already a dependency. taplo also reformats — a pure validation-only internal processor covers the common case.

Crate: toml (already in deps) Complexity: Low

ijson_schema — JSON Schema validation

Replaces: json_schema (Python jsonschema)

Validate JSON files against JSON Schema definitions. The jsonschema Rust crate supports JSON Schema draft 2020-12, draft 7, and draft 4.

Crate: jsonschema Complexity: Medium — need to load schema files and validate against them

imarkdown2html — Markdown to HTML

Replaces: markdown2html (external markdown CLI)

Convert Markdown files to HTML. pulldown-cmark is a fast, CommonMark-compliant Markdown parser written in Rust.

Crate: pulldown-cmark Complexity: Low — parse and render to HTML string, write to output file

iyamlschema — YAML Schema Validation

Validates YAML files against JSON schemas referenced by $schema URLs. Fetches and caches schemas via the webcache, validates data against the schema (including remote $ref resolution), and checks property ordering.

Crate: jsonschema, ureq, serde_yml Complexity: Medium — HTTP fetching, schema compilation, recursive ordering checks

yaml2json — YAML to JSON Conversion

Convert YAML files to pretty-printed JSON.

Crate: serde_yml, serde_json Complexity: Low — parse YAML, serialize as JSON

isass — Sass/SCSS to CSS

Replaces: sass (Dart Sass CLI)

Compile Sass/SCSS files to CSS. The grass crate is a pure-Rust Sass compiler with good compatibility.

Crate: grass Complexity: Low — compile input file, write CSS output

Not Suitable for Internal Implementation

These processors wrap tools with complex, evolving behavior that would be impractical to reimplement:

ruff, pylint, mypy, pyrefly — Python linters/type checkers with deep language understanding
eslint, jshint, stylelint — JavaScript/CSS linters with plugin ecosystems
clippy, cargo — Rust toolchain components
marp — Presentation framework (spawns Chromium)
sphinx, mdbook, jekyll — Full documentation/site generators
shellcheck — Shell script analyzer with extensive rule set
aspell — Spell checker with language dictionaries
chromium, libreoffice, drawio — GUI applications used for rendering
protobuf — Protocol buffer compiler
pdflatex — LaTeX to PDF (entire TeX distribution)

Binary Plugin System

As of now, rsconstruct does not have a binary plugin system. This section documents the approach for future consideration.

Rust applications can dynamically load plugins written in Rust via dlopen/dlsym on shared libraries (.so on Linux, .dylib on macOS, .dll on Windows). The plugin compiles as a cdylib crate, exports extern "C" functions, and the host loads them at runtime using a crate like libloading.

The main constraint is that Rust has no stable ABI. You cannot use Rust traits, generics, or standard library types across the dynamic library boundary. The plugin interface must be C-compatible: extern "C" functions returning opaque pointers, with a vtable or function-pointer struct defining the plugin API.

Crates like abi_stable attempt to provide a stable ABI layer for Rust-to-Rust dynamic loading, but they add significant complexity.

The current Lua plugin system avoids this problem entirely — Lua has a stable, simple FFI. A binary plugin system would offer better performance but at the cost of a much more complex plugin interface and build process (plugins would need to be compiled separately and matched to the host’s ABI).

Missing Processors

Tools found in Makefiles across ../*/ sibling projects that rsconstruct does not yet have processors for. Organized by category, with priority based on breadth of usage.

High Priority — Linters and Validators

eslint

What it does: JavaScript/TypeScript linter (industry standard).
Projects: demos-lang-js
Invocation: eslint $(ALL_JS) or node_modules/.bin/eslint $<
Processor type: Checker

jshint

What it does: JavaScript linter — detects errors and potential problems.
Projects: demos-lang-js, gcp-gemini-cli, gcp-machines, gcp-miflaga, gcp-nikuda, gcp-randomizer, schemas, veltzer.github.io
Invocation: node_modules/.bin/jshint $<
Processor type: Checker

tidy (HTML Tidy)

What it does: HTML/XHTML validator and formatter.
Projects: demos-lang-js, gcp-gemini-cli, gcp-machines, gcp-miflaga, gcp-nikuda, gcp-randomizer, openbook, riddles-book
Invocation: tidy -errors -quiet -config .tidy.config $<
Processor type: Checker

check-jsonschema

What it does: Validates YAML/JSON files against JSON Schema (distinct from rsconstruct’s json_schema which validates JSON against schemas found via $schema key).
Projects: data, schemas, veltzer.github.io
Invocation: check-jsonschema --schemafile $(yq -r '.["$schema"]' $<) $<
Processor type: Checker

cpplint

What it does: C++ linter enforcing Google C++ style guide.
Projects: demos-os-linux
Invocation: cpplint $<
Processor type: Checker

checkpatch.pl

What it does: Linux kernel coding style checker.
Projects: kcpp
Invocation: $(KDIR)/scripts/checkpatch.pl --file $(C_SOURCES) --no-tree
Processor type: Checker

standard (StandardJS)

What it does: JavaScript style guide, linter, and formatter — zero config.
Projects: demos-lang-js
Invocation: node_modules/.bin/standard $<
Processor type: Checker

jslint

What it does: JavaScript code quality linter (Douglas Crockford).
Projects: demos-lang-js
Invocation: node_modules/.bin/jslint $<
Processor type: Checker

jsl (JavaScript Lint)

What it does: JavaScript lint tool.
Projects: keynote, myworld-php
Invocation: jsl --conf=support/jsl.conf --quiet --nologo --nosummary --nofilelisting $(SOURCES_JS)
Processor type: Checker

gjslint (Google Closure Linter)

What it does: JavaScript style checker following Google JS style guide.
Projects: keynote, myworld-php
Invocation: $(TOOL_GJSLINT) --flagfile support/gjslint.cfg $(JS_SRC)
Processor type: Checker

checkstyle

What it does: Java source code style checker.
Projects: demos-lang-java, keynote
Invocation: java -cp $(scripts/cp.py) $(MAINCLASS_CHECKSTYLE) -c support/checkstyle_config.xml $(find . -name "*.java")
Processor type: Checker

pyre

What it does: Python type checker from Facebook/Meta.
Projects: archive.apiiro.TrainingDataLaboratory, archive.work-amdocs-py
Invocation: pyre check
Processor type: Checker

High Priority — Formatters

black

What it does: Opinionated Python code formatter.
Projects: archive.apiiro.TrainingDataLaboratory, archive.work-amdocs-py
Invocation: black --target-version py36 $(ALL_PACKAGES)
Processor type: Checker (using --check mode) or Formatter

uncrustify

What it does: C/C++/Java source code formatter.
Projects: demos-os-linux, xmeltdown
Invocation: uncrustify -c support/uncrustify.cfg --no-backup -l C $(ALL_US_C)
Processor type: Formatter

astyle (Artistic Style)

What it does: C/C++/Java source code indenter and formatter.
Projects: demos-os-linux
Invocation: astyle --verbose --suffix=none --formatted --preserve-date --options=support/astyle.cfg $(ALL_US)
Processor type: Formatter

indent (GNU Indent)

What it does: C source code formatter (GNU style).
Projects: demos-os-linux
Invocation: indent $(ALL_US)
Processor type: Formatter

High Priority — Testing

pytest

What it does: Python test framework.
Projects: 50+ py* projects (pyanyzip, pyapikey, pyapt, pyawskit, pyblueprint, pybookmarks, pyclassifiers, pycmdtools, pyconch, pycontacts, pycookie, pydatacheck, pydbmtools, pydmt, pydockerutils, pyeventroute, pyeventsummary, pyfakeuse, pyflexebs, pyfoldercheck, pygcal, pygitpub, pygooglecloud, pygooglehelper, pygpeople, pylogconf, pymakehelper, pymount, pymultienv, pymultigit, pymyenv, pynetflix, pyocutil, pypathutil, pypipegzip, pypitools, pypluggy, pypowerline, pypptkit, pyrelist, pyscrapers, pysigfd, pyslider, pysvgview, pytagimg, pytags, pytconf, pytimer, pytsv, pytubekit, pyunique, pyvardump, pyweblight, and archive.*)
Invocation: pytest tests or python -m pytest tests
Processor type: Checker (mass, per-directory)

High Priority — YAML/JSON Processing

yq

What it does: YAML/JSON processor (like jq but for YAML).
Projects: data, demos-lang-yaml, schemas, veltzer.github.io
Invocation: yq < $< > $@ (format/validate) or yq -r '.key' $< (extract)
Processor type: Checker or Generator

Medium Priority — Compilers

javac

What it does: Java compiler.
Projects: demos-lang-java, jenable, keynote
Invocation: javac -Werror -Xlint:all $(JAVA_SOURCES) -d out/classes
Processor type: Generator

go build

What it does: Go language compiler.
Projects: demos-lang-go
Invocation: go build -o $@ $<
Processor type: Generator (single-file, like cc_single_file)

kotlinc

What it does: Kotlin compiler.
Projects: demos-lang-kotlin
Invocation: kotlinc $< -include-runtime -d $@
Processor type: Generator (single-file)

ghc

What it does: Glasgow Haskell Compiler.
Projects: demos-lang-haskell
Invocation: ghc -v0 -o $@ $<
Processor type: Generator (single-file)

ldc2

What it does: D language compiler (LLVM-based).
Projects: demos-lang-d
Invocation: ldc2 $(FLAGS) $< -of=$@
Processor type: Generator (single-file)

nasm

What it does: Netwide Assembler (x86/x64).
Projects: demos-lang-nasm
Invocation: nasm -f $(ARCH) -o $@ $<
Processor type: Generator (single-file)

rustc

What it does: Rust compiler for single-file programs (as opposed to cargo for projects).
Projects: demos-lang-rust
Invocation: rustc $(FLAGS_DBG) $< -o $@
Processor type: Generator (single-file)

dotnet

What it does: .NET SDK CLI — builds C#/F# projects.
Projects: demos-lang-cs
Invocation: dotnet build --nologo --verbosity quiet
Processor type: MassGenerator

dtc (Device Tree Compiler)

What it does: Compiles device tree source (.dts) to device tree blob (.dtb) for embedded Linux.
Projects: clients-heqa (8 subdirectories)
Invocation: dtc -I dts -O dtb -o $@ $<
Processor type: Generator (single-file)

Medium Priority — Build Systems

cmake

What it does: Cross-platform build system generator.
Projects: demos-build-cmake
Invocation: cmake -B $@ && cmake --build $@
Processor type: MassGenerator

mvn (Apache Maven)

What it does: Java project build and dependency management.
Projects: demos-lang-java/maven
Invocation: mvn compile
Processor type: MassGenerator

ant (Apache Ant)

What it does: Java build tool (XML-based).
Projects: demos-lang-java, keynote
Invocation: ant checkstyle
Processor type: MassGenerator

Medium Priority — Converters and Generators

pygmentize

What it does: Syntax highlighter — converts source code to HTML, SVG, PNG.
Projects: demos-misc-highlight
Invocation: pygmentize -f html -O full -o $@ $<
Processor type: Generator (single-file)

slidev

What it does: Markdown-based presentation tool — exports to PDF.
Projects: demos-lang-slidev
Invocation: node_modules/.bin/slidev export $< --with-clicks --output $@
Processor type: Generator (single-file)

jekyll

What it does: Static site generator (Ruby-based, used by GitHub Pages).
Projects: site-personal-jekyll
Invocation: jekyll build --source $(SOURCE_FOLDER) --destination $(DESTINATION_FOLDER)
Processor type: MassGenerator

lilypond

What it does: Music engraving program — compiles .ly files to PDF sheet music.
Projects: demos-lang-lilypond, openbook
Invocation: scripts/wrapper_lilypond.py ... $<
Processor type: Generator (single-file)

wkhtmltoimage

What it does: Renders HTML to image using WebKit engine.
Projects: demos-misc-highlight
Invocation: wkhtmltoimage $(WK_OPTIONS) $< $@
Processor type: Generator (single-file)

Medium Priority — Documentation

jsdoc

What it does: API documentation generator for JavaScript.
Projects: jschess, keynote
Invocation: node_modules/.bin/jsdoc -d $(JSDOC_FOLDER) -c support/jsdoc.json out/src
Processor type: MassGenerator

Low Priority — Minifiers

jsmin

What it does: JavaScript minifier (removes whitespace and comments).
Projects: jschess
Invocation: node_modules/.bin/jsmin < $< > $(JSMIN_JSMIN)
Processor type: Generator (single-file)

yuicompressor

What it does: JavaScript/CSS minifier and compressor (Yahoo).
Projects: jschess
Invocation: node_modules/.bin/yuicompressor $< -o $(JSMIN_YUI)
Processor type: Generator (single-file)

closure compiler

What it does: JavaScript optimizer and minifier (Google Closure).
Projects: keynote
Invocation: tools/closure.jar $< --js_output_file $@
Processor type: Generator (single-file)

Low Priority — Preprocessors

gpp (Generic Preprocessor)

What it does: General-purpose text preprocessor with macro expansion.
Projects: demos/gpp
Invocation: gpp -o $@ $<
Processor type: Generator (single-file)

m4

What it does: Traditional Unix macro processor.
Projects: demos/m4
Invocation: m4 $< > $@
Processor type: Generator (single-file)

Low Priority — Binary Analysis

objdump

What it does: Disassembles object files (displays assembly code).
Projects: demos-os-linux
Invocation: objdump --disassemble --source $< > $@
Processor type: Generator (single-file, post-compile)

Low Priority — Packaging

dpkg-deb

What it does: Builds Debian .deb packages.
Projects: archive.myrepo
Invocation: dpkg-deb --build deb/mypackage ~/packages
Processor type: Generator

reprepro

What it does: Manages Debian APT package repositories.
Projects: archive.myrepo
Invocation: reprepro --basedir $(config.apt.service_dir) export $(config.apt.codename)
Processor type: Generator

Low Priority — Profiling

pyinstrument

What it does: Python profiler with HTML output.
Projects: archive.apiiro.TrainingDataLaboratory, archive.work-amdocs-py
Invocation: pyinstrument --renderer=html -m $(MAIN_MODULE)
Processor type: Generator

Low Priority — Code Metrics

sloccount

What it does: Counts source lines of code and estimates development cost.
Projects: demos-lang-java, demos-lang-r, demos-os-linux, jschess
Invocation: sloccount .
Processor type: Checker (whole-project)

Low Priority — Dependency Generation

makedepend

What it does: Generates C/C++ header dependency rules for Makefiles.
Projects: xmeltdown
Invocation: makedepend -I... -- $(CFLAGS) -- $(SRC)
Notes: rsconstruct’s built-in C/C++ dependency analyzer already handles this.

Low Priority — Embedded

fdtoverlay

What it does: Applies device tree overlays to a base device tree blob.
Projects: clients-heqa/come_overlay
Invocation: fdtoverlay -i $@ -o $@.tmp $$overlay && mv $@.tmp $@
Processor type: Generator

fdtput

What it does: Modifies properties in a device tree blob.
Projects: clients-heqa/come_overlay
Invocation: fdtput -r $@ $$node
Processor type: Generator

Requirements Generator — Design

A processor that scans Python source files and produces a requirements.txt listing the third-party distributions the project imports. Fills the gap between the Python analyzer (which discovers local dep edges) and the pip processor (which consumes requirements.txt).

Problem

Users have Python projects with import statements. They want the set of PyPI distributions their code needs, written out to requirements.txt. Today they maintain this file by hand, which drifts from the actual imports.

Shape

A whole-project Generator processor named requirements:

Inputs: every .py file in the project (same scan as the Python analyzer — file_index.scan(&self.config.standard, true)).
Output: a single requirements.txt (path configurable).
Discovery: one Product with all .py files as inputs, one output path. Structurally identical to the tags processor.

The classification problem

Every import X lands in one of three buckets:

Local — a module that resolves to a file in the project. Skip.
Stdlib — a module shipped with Python (os, sys, json, …). Skip.
Third-party — a PyPI distribution. Emit to requirements.txt.

The Python analyzer already resolves bucket 1 via PythonDepAnalyzer::resolve_module. The new processor needs buckets 2 and 3.

Stdlib detection

Python 3.10+ ships sys.stdlib_module_names — a frozenset of every stdlib top-level module name. We bake this list into a static table (src/processors/generators/python_stdlib.rs) rather than probing python3 at build time. Reasons:

The list is stable across 3.10+ with a handful of additions per minor release.
No tool dependency at build time — keeps the processor offline and hermetic.
The list is ~300 names, a few KB of source.

A refresh script regenerates the table from python3 -c 'import sys; print(sorted(sys.stdlib_module_names))' when we bump Python support. The list lives alongside the processor, not in a user-facing config.

Import → distribution mapping

The import name is not always the PyPI distribution name:

Import	Distribution
`cv2`	`opencv-python`
`yaml`	`PyYAML`
`PIL`	`Pillow`
`sklearn`	`scikit-learn`
`bs4`	`beautifulsoup4`

We bake a curated table of the common ~40 mismatches into the processor and default everything else to identity (import X → distribution X). Users override via config:

[processor.requirements.mapping]
cv2 = "opencv-python"
custom_internal = "our-private-dist"

User entries win over the built-in table. This is lossy by design — we accept that unusual packages need a config entry — in exchange for:

No dependency on an installed Python environment.
requirements.txt generation works on a clean checkout (no chicken-and-egg with pip install).
Deterministic output regardless of the caller’s environment.

The alternative — probing importlib.metadata.packages_distributions() — is more accurate but requires packages to already be installed. Rejected for now; can be added later as an opt-in resolve = "probe" mode if users hit the mapping ceiling.

Configuration

[processor.requirements]
output = "requirements.txt"           # Output file path
exclude = []                          # Import names to never emit (e.g. internal vendored modules)
sorted = true                         # Sort output alphabetically (vs. discovery order)
header = true                         # Emit a "# Generated by rsconstruct" header line

[processor.requirements.mapping]
cv2 = "opencv-python"                 # User-provided import → distribution overrides

Key	Type	Default	Description
`output`	string	`"requirements.txt"`	Output file path
`exclude`	string[]	`[]`	Import names to never emit
`sorted`	bool	`true`	Sort entries alphabetically
`header`	bool	`true`	Include a comment header line
`mapping`	map	`{}`	Per-project import→distribution overrides

Pinning (pkg==1.2.3) is deferred. The first iteration emits bare names. Adding pinning later means probing pip show or parsing a lockfile — separate concern.

Code organization

Shared import scanner

Factor the regex scanning out of src/analyzers/python.rs into a module function shared by the analyzer and the generator:

#![allow(unused)]
fn main() {
// src/analyzers/python.rs
pub(crate) fn scan_python_imports(path: &Path) -> Result<Vec<String>> { ... }
}

Returns the raw top-level module names found in import X and from X import ... lines. The analyzer then runs this through resolve_module to keep local ones; the generator runs it through the stdlib table and mapping to produce the final list.

This fixes architecture-observations #6 (analyzers can’t hand data to processors) at the scope of this one feature: instead of building a cross-processor channel, we share a pure function.

Files

src/processors/generators/requirements.rs — the processor, ~150 lines.
src/processors/generators/python_stdlib.rs — the stdlib names table (static &[&str]) and a is_stdlib(module: &str) -> bool helper.
src/processors/generators/distribution_map.rs — the curated import→distribution mapping, a resolve_distribution(import: &str) -> &str helper that falls through to identity.
src/config/processor_configs.rs — add RequirementsConfig.
src/processors/mod.rs — add pub const REQUIREMENTS = "requirements" to names module.
docs/src/processors/requirements.md — user-facing processor doc.

Processor structure

Mirrors tags (whole-project generator with one output):

#![allow(unused)]
fn main() {
pub struct RequirementsProcessor {
    base: ProcessorBase,
    config: RequirementsConfig,
}

impl Processor for RequirementsProcessor {
    fn discover(&self, graph, file_index, instance_name) -> Result<()> {
        // Scan for .py files; if none, no product.
        // Add one product: inputs=all .py files, outputs=[output_path].
    }

    fn supports_batch(&self) -> bool { false }

    fn execute(&self, _ctx, product) -> Result<()> {
        // 1. Scan each input .py for imports.
        // 2. For each top-level module name:
        //    - Skip if local (resolves to a project file).
        //    - Skip if stdlib.
        //    - Skip if in user's `exclude`.
        //    - Map import → distribution name.
        // 3. Dedupe, sort if configured, write to output.
    }
}
}

Cache behavior

Falls naturally out of the descriptor-based cache:

Inputs: every .py file + config hash.
Output: requirements.txt.
Adding/removing an import changes file contents, triggers rebuild.
Changing config (new mapping entry, new exclude) changes config hash, triggers rebuild.
Code changes inside a function that don’t affect imports still trigger a rebuild, since we can’t cheaply know which lines matter. Acceptable — the regeneration is fast.

Auto-detection

auto_detect returns true when the file index contains any .py files. Same criterion as the Python analyzer.

Out of scope (first cut)

Version pinning.
Multiple output files (requirements-dev.txt, requirements-test.txt).
Optional dependencies / extras (pkg[extra]).
Reading existing requirements.txt to preserve comments or pins.
pyproject.toml or setup.py output — requirements.txt only.

Each is a clean follow-up if users ask.

Crates.io Publishing

Notes on publishing rsconstruct to crates.io.

Version Limits

There is no limit on how many versions can be published to crates.io. You can publish as many releases as needed without worrying about quota or cleanup.

Pruning Old Releases

Crates.io does not support deleting published versions. Once a version is uploaded, it exists permanently.

The only removal mechanism is yanking (cargo yank --version 0.1.0), which:

Prevents new projects from adding a dependency on the yanked version
Does not break existing projects that already depend on it (they continue to download it via their lockfile)
Does not delete the crate data from the registry

Yanking should only be used for versions with security vulnerabilities or serious bugs, not for general housekeeping.

Publishing a New Version

Update the version in Cargo.toml
Run cargo publish --dry-run to verify
Run cargo publish to upload

Feature: Per-Processor `max_jobs`

Problem

When running rsconstruct build -j 20, all processors run with the same parallelism. Processors like marp spawn heavyweight subprocesses (headless Chromium via Puppeteer), and 20 concurrent Chromium instances cause non-deterministic TargetCloseError crashes due to resource exhaustion.

Desired Behavior

Allow each processor to declare a max_jobs limit in rsconstruct.toml:

[processor.marp]
formats = ["pdf"]
max_jobs = 4

With -j 20, marp would run at most 4 concurrent jobs while other processors use the full 20.

max_jobs unset or 0 means “use the global -j value” (current behavior).

Implementation Plan

1. Add `max_jobs` field to processor configs

File: src/config/processor_configs.rs

Add to the generator_config! macro (all variants) and checker config structs:

#![allow(unused)]
fn main() {
#[serde(default)]
pub max_jobs: Option<usize>,
}

Add to Default impl (max_jobs: None) and KnownFields list.

2. Expose `max_jobs()` on the `ProductDiscovery` trait

File: src/processors/mod.rs

#![allow(unused)]
fn main() {
fn max_jobs(&self) -> Option<usize> { None }
}

Each processor implementation returns self.config.max_jobs.

3. Build a per-processor semaphore map in the executor

File: src/executor/mod.rs

Add to ExecutorOptions or build during executor construction:

#![allow(unused)]
fn main() {
pub processor_max_jobs: HashMap<String, usize>,
}

Constructed from the processor map by calling max_jobs() on each processor.

4. Use semaphores in the dispatch loop

File: src/executor/execution.rs (lines 177-203)

Create an Arc<Semaphore> per processor that has a max_jobs limit. In the execution loop:

Batch groups: If the processor has max_jobs, the batch thread acquires a permit before executing each chunk, limiting concurrent Chromium (or similar) processes.
Non-batch items: Instead of dividing all non-batch items into parallel chunks regardless of processor, group by processor first. Items from limited processors get their own chunking (min of max_jobs and parallel), others use global parallel.

5. Config display

Ensure rsconstruct processors config marp and rsconstruct config show display the max_jobs field.

Files to Modify

src/config/processor_configs.rs - add max_jobs field to macros and manual configs
src/processors/mod.rs - add max_jobs() to ProductDiscovery trait
src/processors/*.rs - implement max_jobs() for each processor
src/executor/mod.rs - add semaphore map to ExecutorOptions
src/executor/execution.rs - semaphore-based dispatch in the level loop
src/builder/build.rs - build the processor limits map and pass to executor

Alternatives Considered

batch_size workaround: Setting batch_size limits items per batch invocation, but batch mode runs sequentially within one process, making it slow.
Global lower -j: Works but penalizes lightweight processors unnecessarily.

Plugin Registry: Ecosystem Survey

rsconstruct uses a hand-built plugin registry where processors self-register at link time via inventory::submit!, declare their config schema, and are instantiated from TOML config at runtime. This page documents the search for existing Rust crates that could replace this machinery.

What rsconstruct needs

The plugin system combines four responsibilities:

Link-time self-registration — each processor file submits a plugin entry. No central list to maintain. Adding a processor = adding one file.
Per-plugin TOML config — each plugin declares known fields, required fields, defaults, and a create(toml::Value) -> Box<dyn Processor> factory. The framework deserializes the matching [processor.NAME] section and passes it to the factory.
Defaults and validation — processor defaults, scan defaults, and output-dir defaults are applied in layers before deserialization. Unknown fields are rejected. Required fields are enforced.
Name-to-factory mapping — the registry maps processor names to their plugin entries for creation, introspection (processors list), and config display.

Crates evaluated

inventory / linkme

The foundation rsconstruct already uses. inventory provides link-time collection of typed values into a global iterator. linkme does the same via distributed slices. Neither has any config awareness — they solve (1) only.

Verdict: already in use; does its job well.

spring-rs

The closest match conceptually. A Spring Boot-style Rust framework that combines inventory-based plugin registration with TOML config via #[derive(Configurable)] and #[config_prefix = "..."] attributes. Each plugin declares its config struct with the derive macro, and the framework auto-deserializes the matching TOML section.

However, spring-rs is a full application framework for web services (integrates axum, sqlx, OpenTelemetry, etc.). Pulling it in for a build tool would add a massive, opinionated dependency tree for ~50 lines of glue code savings.

Verdict: right pattern, wrong scope. Not suitable.

config (crate)

Handles layered config loading from multiple sources (TOML, YAML, JSON, env vars) with type-safe deserialization. No plugin registration awareness at all — it’s a config library, not a plugin framework.

Verdict: solves config layering, not plugin registration.

extism

A WebAssembly plugin runtime. Plugins are compiled to WASM and loaded at runtime with sandboxing. Completely different problem — runtime-loaded external plugins vs. compile-time self-registering internal plugins.

Verdict: wrong problem domain.

plugin-interfaces

Designed for chat-client applications with FFI and inter-plugin messaging. Not relevant to build tools.

Verdict: not applicable.

toml-cfg

Provides compile-time config macros (#[toml_cfg::toml_config]) that embed config values from a TOML file at build time. No runtime registry, no plugin awareness.

Verdict: compile-time only; not what we need.

Conclusion

No existing crate provides the combination of link-time registration + per-plugin TOML config deserialization + defaults/validation + name-to-factory mapping. This is a genuine gap in the Rust ecosystem.

rsconstruct’s manual approach (~50 lines of glue in src/registries/processor.rs using inventory::submit! + serde + the ProcessorPlugin struct) is the standard Rust pattern for this. It is well-understood, has no external framework dependency, and is unlikely to be improved upon by a third-party crate without bringing in unrelated complexity.

Decision: keep the current hand-built registry. Revisit if a focused plugin-config crate emerges in the ecosystem.

Survey conducted: April 2026.

Rejected Audit Findings

Issues flagged during code audits (rounds 7-12) that were assessed and deliberately rejected. Documented here to prevent re-flagging in future audits.

Duration u128-to-u64 overflow in JSON output

File: src/json_output.rs (lines 130, 151) Flagged in: rounds 9, 10, 11, 12

Duration::as_millis() returns u128, cast to u64 without bounds checking. Overflows after ~584 million years. No real build will ever hit this. Not fixing.

Pre-1970 mtime cache collision

File: src/object_store/checksums.rs (lines 25-27) Flagged in: rounds 9, 10, 11, 12

Files with mtime before Unix epoch (1970) get unwrap_or_default() mapping to (0, 0). Two such files could share a cached mtime entry. Pre-1970 timestamps don’t occur on real build inputs. The mtime cache is only an optimization — the actual input checksum comparison catches real changes. Not fixing.

Dependency unchanged logic — no-dep products

File: src/executor/execution.rs (line 587) Flagged in: round 9

Agent claimed !deps.is_empty() && deps.iter().all(...) should be deps.is_empty() || deps.iter().all(...). Wrong — products with no dependencies should NOT reuse cached checksums. The optimization is specifically for products whose upstream deps produced identical output, meaning transitive inputs are unchanged. No-dep products have no such guarantee.

Batch handle_success return value ignored

File: src/executor/execution.rs (line 339) Flagged in: round 10

handle_success() return value is not checked in batch processing. This is correct — handle_success already calls record_failure internally when caching fails, properly marking the product as failed. In non-batch, the return value triggers a break from the retry loop, but batch has no retry loop. Stats are correct either way.

record_failure ignores mark_processor_failed in keep-going mode

File: src/executor/handlers.rs (lines 20-39) Flagged in: rounds 11, 12

In keep-going mode, mark_processor_failed parameter is ignored. This is by design — failed_processors is only checked in non-keep-going mode to skip subsequent products from the same processor. In keep-going mode, all products run regardless, so tracking failed processors is unnecessary.

Arc reference leak — failed_processors not unwrapped

File: src/executor/execution.rs (collect_build_stats) Flagged in: round 11

Agent claimed not unwrapping failed_processors Arc prevents other Arc::try_unwrap calls from succeeding. Wrong — each Arc has its own independent reference count. Not unwrapping one has zero effect on others.

Tera output paths lose directory structure

File: src/processors/generators/tera.rs (lines 100-106) Flagged in: round 10

Templates in subdirectories produce output at project root (e.g., tera.templates/sub/README.md.tera → README.md). This is intentional — the comment on line 105 explicitly says “Output is at project root with the .tera extension stripped.” By design.

Lua stub_path uses suffix as directory name

File: src/processors/lua_processor.rs (line 126) Flagged in: round 10

rsconstruct.stub_path(source, suffix) uses suffix to construct the output directory (out/{suffix}). This is the designed Lua API — plugins control their own output directory naming via the suffix parameter.

Lua clean count masking with saturating_sub

File: src/processors/lua_processor.rs (lines 450-468) Flagged in: rounds 10, 11

Custom Lua clean functions report removal count via existed_before.saturating_sub(exist_after). If the Lua function doesn’t remove files, that’s the plugin’s responsibility. The count accurately reflects what was actually removed. Not a bug.

file_index src_exclude_dirs substring matching

File: src/file_index.rs (lines 76-80) Flagged in: rounds 9, 10

src_exclude_dirs uses path_str.contains(dir) for filtering. The documented convention uses slash-delimited patterns like "/kernel/", which prevents false positives on path substrings. This is the configured behavior.

Object store trim path reconstruction

File: src/object_store/management.rs (lines 86-103) Flagged in: round 9

Reconstructing checksums from filesystem paths (prefix + rest) during cache trim. The path structure is fixed (objects/[2-char]/[rest]), set by store_object(). Unexpected files in the objects directory are silently ignored during trim, which is the correct behavior.

Partial output caching (before the fix)

File: src/object_store/operations.rs (lines 144-147) Flagged in: round 9

Originally flagged as a design choice. User overruled — missing outputs are now an error (anyhow::ensure!). This was accepted and fixed in a later commit, not rejected.

Zspell read-modify-write race

File: src/processors/checkers/zspell.rs (lines 192-229) Flagged in: round 11

Agent claimed file read-modify-write isn’t protected. Wrong — self.words_to_add.lock() on line 193 acquires the mutex, which is held for the entire function (not dropped until return). The lock prevents concurrent threads from interleaving. Cross-process races are not a concern for RSConstruct.

Duplicate dependency edges in resolve_dependencies

File: src/graph.rs (lines 227-230) Flagged in: round 12

Agent claimed duplicate edges cause incorrect topological sort. The scenario requires a product to list the same input file twice, which doesn’t happen — FileIndex.scan() returns unique paths. Even if it did, duplicate edges would increment and decrement in_degree the same number of times, netting out correctly.

Python string injection in load_python_config

File: src/processors/generators/tera.rs (lines 205-208) Flagged in: round 12

Agent claimed newlines in file paths could inject Python code. File paths come from FileIndex (filesystem scan) or Tera templates written by the project author — both are trusted input. Linux file paths from filesystem scans don’t contain newlines.

Batch assert_eq should be error return

File: src/executor/execution.rs (lines 323-325) Flagged in: round 12

Agent suggested replacing assert_eq! with anyhow::bail! for batch result count validation. The assert is deliberate — a processor returning the wrong number of results is a contract violation (programming error), not a recoverable runtime condition. Assertions are appropriate for invariant violations.

Platform portability (Windows, macOS)

Flagged in: rounds 9, 10, 11, 12

Multiple agents flagged std::os::unix usage without #[cfg(unix)] guards, and missing #[cfg(windows)]/#[cfg(target_os = "macos")] blocks. RSConstruct is Linux-only. No platform compatibility code will be added.

DB recovery — file might not exist

File: src/db.rs Flagged in: round 12

Agent re-flagged db.rs recovery, claiming fs::remove_file could fail if the file doesn’t exist. This was already fixed in round 8 — let _ = fs::remove_file() was changed to fs::remove_file()? which properly propagates errors.

Suggestions

Ideas for future improvements, organized by category. Completed items have been moved to suggestions-done.md.

Grades:

Urgency: high (users need this), medium (nice to have), low (speculative/future)
Complexity: low (hours), medium (days), high (weeks+)

Build Execution

Distributed builds

Run builds across multiple machines, similar to distcc or icecream for C/C++.
A coordinator node distributes work to worker nodes, each running rsconstruct in worker mode.
Workers execute products and return outputs to the coordinator, which caches them locally.
Challenges: network overhead for small products, identical tool versions across workers, local filesystem access.
Urgency: low | Complexity: high

Sandboxed execution

Run each processor in an isolated environment where it can only access its declared inputs.
Prevents accidental undeclared dependencies.
On Linux, namespaces can provide lightweight sandboxing.
Urgency: low | Complexity: high

Content-addressable outputs (unchanged output pruning)

Hash outputs too to skip downstream rebuilds when an input changes but produces identical output.
Bazel calls this “unchanged output pruning.”
Urgency: medium | Complexity: medium

Persistent daemon mode

Keep rsconstruct running as a background daemon to avoid startup overhead.
Benefits: instant file index via inotify, warm Lua VMs, connection pooling, faster incremental builds.
Daemon listens on Unix socket (.rsconstruct/daemon.sock).
rsconstruct watch becomes a client that triggers rebuilds on file events.
Urgency: low | Complexity: high

Persistent workers

Keep long-running tool processes alive to avoid startup overhead.
Instead of spawning ruff or pylint per invocation, keep one process alive and feed it files.
Bazel gets 2-4x speedup for Java this way. Could benefit pylint/mypy which have heavy startup.
Multiplex variant: multiple requests to a single worker process via threads.
Urgency: medium | Complexity: high

Dynamic execution (race local vs remote)

Start both local and remote execution of the same product; use whichever finishes first and cancel the other.
Useful when remote cache is slow or flaky.
Configurable per-processor via execution strategy.
Urgency: low | Complexity: high

Execution strategies per processor

Map each processor to an execution strategy: local, remote, sandboxed, or dynamic.
Different processors may benefit from different execution models.
Config: [processor.ruff] execution = "remote", [processor.cc_single_file] execution = "sandboxed".
Urgency: low | Complexity: medium

Build profiles

Named configuration sets for different build scenarios (ci, dev, release).
Profiles inherit from base configuration and override specific values.
Usage: rsconstruct build --profile=ci
Urgency: medium | Complexity: medium

Conditional processors

Enable or disable processors based on conditions (environment variables, file existence, git branch, custom commands).
Multiple conditions can be combined with all/any logic.
Urgency: low | Complexity: medium

Target aliases

Define named groups of processors for easy invocation.
Usage: rsconstruct build @lint, rsconstruct build @test
Special aliases: @all, @changed, @failed
File-based targeting: rsconstruct build src/main.c
Urgency: medium | Complexity: medium

Graph & Query

Build graph query language

Support queries like rsconstruct query deps out/foo, rsconstruct query rdeps src/main.c, rsconstruct query processor:ruff.
Useful for debugging builds and CI systems that want to build only affected targets.
Urgency: low | Complexity: medium

Affected analysis

Given changed files (from git diff), determine which products are affected and only build those.
Useful for large projects where a full build is expensive.
Urgency: medium | Complexity: medium

Critical path analysis

Identify the longest sequential chain of actions in a build.
Helps users optimize their slowest builds by showing what’s actually on the critical path.
Display with rsconstruct build --critical-path or include in --timings output.
Urgency: medium | Complexity: medium

Extensibility

Plugin registry

A central repository of community-contributed Lua plugins.
Install with rsconstruct plugin install eslint.
Registry could be a GitHub repository with a JSON index.
Version pinning in rsconstruct.toml.
Urgency: low | Complexity: high

Project templates

Initialize new projects with pre-configured processors and directory structure.
rsconstruct init --template=python, rsconstruct init --template=cpp, etc.
Custom templates from local directories or URLs.
Urgency: low | Complexity: medium

Rule composition / aspects

Attach cross-cutting behavior to all targets of a certain type (e.g., “add coverage analysis to every C++ compile”).
Urgency: low | Complexity: high

Output groups / subtargets

Named subsets of a target’s outputs that can be requested selectively.
E.g., rsconstruct build --output-group=debug or per-product subtarget selection.
Useful for targets that produce multiple output types (headers, binaries, docs).
Urgency: low | Complexity: medium

Visibility / access control

Restrict which processors can consume which files or directories.
Prevents accidental cross-boundary dependencies in large repos.
Config: per-processor visibility rules or directory-level .rsconstruct-visibility files.
Urgency: low | Complexity: medium

Developer Experience

Build Event Protocol / structured event stream

rsconstruct already has --json on stdout with JSON Lines events (BuildEvent, ProductStart, ProductComplete, BuildSummary) and --trace for Chrome trace format.
A proper Build Event Protocol (file or gRPC stream) would enable external dashboards, CI integrations, and build analytics services beyond what JSON Lines provides.
Write events to a file (--build-event-log=events.pb) or stream to a remote service.
Richer event types: action graph, configuration, progress, test results.
Urgency: medium | Complexity: medium

Build notifications

Desktop notifications when builds complete, especially for long builds.
Platform-specific: notify-send (Linux), osascript (macOS).
Config: notify = true, notify_on_success = false.
Urgency: low | Complexity: low

Parallel dependency analysis

The cpp analyzer scans files sequentially, which can be slow for large codebases.
Parallelize header scanning using rayon or tokio.
Urgency: low | Complexity: medium

IDE / LSP integration

Language Server Protocol server for IDE integration.
Features: diagnostics, code actions, hover info, file decorations.
Plugins for VS Code, Neovim, Emacs.
Urgency: low | Complexity: high

Build log capture

Save stdout/stderr from each product execution to a log file.
Config: log_dir = ".rsconstruct/logs", log_retention = 10.
rsconstruct log ruff:main.py to view logs.
Urgency: low | Complexity: medium

Build timing history

Store timing data to .rsconstruct/timings.json after each build.
rsconstruct timings shows slowest products, trends, time per processor.
Urgency: low | Complexity: medium

Remote cache authentication

S3 and HTTP/HTTPS remote caches are already supported.
Still needed: explicit bearer token support, GCS backend, and environment variable substitution for secrets in config.
Urgency: medium | Complexity: medium

`rsconstruct lint` — Run only checkers

Convenience command to run only checker processors.
Equivalent to rsconstruct build -p ruff,pylint,... but shorter.
Urgency: low | Complexity: low

Watch mode keyboard commands

During rsconstruct watch, support r (rebuild), c (clean), q (quit), Enter (rebuild now), s (status).
Only activate when stdin is a TTY.
Urgency: low | Complexity: medium

Layered config files

Support config file layering: system (/etc/rsconstruct/config.toml), user (~/.config/rsconstruct/config.toml), project (rsconstruct.toml).
Lower layers provide defaults, higher layers override.
Per-command overrides via [build], [watch] sections.
Similar to Bazel’s .bazelrc layering.
Urgency: low | Complexity: low

Test sharding

Split large test targets across multiple parallel shards.
Set TEST_TOTAL_SHARDS and TEST_SHARD_INDEX environment variables for test runners.
Config: shard_count = 4 per processor or product.
Useful for pytest/doctest processors when added.
Urgency: low | Complexity: medium

Runfiles / runtime dependency trees

Track runtime dependencies (shared libs, config files, data files) separately from build dependencies.
Generate a runfiles directory per executable with symlinks to all transitive runtime deps.
Useful for deployment, packaging, and containerization.
Urgency: low | Complexity: high

On-demand processors (`build_by_default = false`)

Today every declared processor runs on every rsconstruct build. The only per-invocation escape hatches are -x name (remember every time) or enabled = false in the config (remember to flip back). Neither fits the “this processor exists, don’t run it unless I ask” use case — common for slow lifecycle processors like python_package, docker_build, publish, release_tarball.
Add a per-processor boolean field defaulting to true: build_by_default = false on a processor means it’s discovered and classified like any other, but its products are filtered out of the default run.
Prior art: meson’s build_by_default: false, Bazel’s tags = ["manual"], buck2’s tags = ["manual"]. All use the same shape — declarative opt-out on the rule, per-invocation opt-in via target naming.
CLI semantics map cleanly onto existing -p/-x machinery:
- rsconstruct build → excludes build_by_default = false processors (new behaviour).
- rsconstruct build -p python_package → includes only python_package; the -p explicit inclusion overrides the default-off flag.
- rsconstruct build -p ruff,python_package → includes both, including the opt-in one.
- rsconstruct build --all (new flag) → includes everything including on-demand processors. Useful for CI that wants to verify the opt-in path doesn’t bitrot.

Example config:

[processor.python_package]
build_by_default = false
src_dirs = ["."]

Design considerations:
- @all meta-shortcut: the existing @checkers / @generators aliases should continue to mean “all of that type, subject to the default-off filter.” Users who want “all checkers including on-demand ones” would say rsconstruct build --all -p @checkers — rare enough that the composition is fine.
- Error on contradiction: -p X -x X already errors; -p X where X has build_by_default = false should just work (explicit opt-in wins over declarative opt-out).
- Watch mode: rsconstruct watch should honour the same default — don’t rebuild the package processor on every file save. Users who want watch-mode packaging can add -p python_package to the watch invocation.
- Discovery cost: on-demand processors still run discovery every build, because we need to know what their products would be (for output-conflict detection, graph completeness, and --all support). This is negligible — discovery is O(files matched), not O(cost of running).
Follow-up idea: named goals (meson-style aggregated targets or npm-style scripts) for the “I want a lint goal / deploy goal / ci goal” pattern. That’s Pattern B, layered above per-processor config — not needed to solve the basic on-demand case.
Urgency: medium | Complexity: low

Decomposed cache key for richer `--explain`

Today every product has a single descriptor key that mixes input checksum + config hash + tool-version hash + variant. A miss tells us “the key changed” but not which component. --explain can only say BUILD (no cache entry) / BUILD (output missing) — not “your cflags changed” or “an input file changed”.
Store the three sub-hashes (input, config, tool) in a new redb table keyed by stable product identity — (processor_iname, primary_path) where primary_path is the first output for generators or the first input for checkers.
Schema: product_components: (processor, primary_path) -> { input_hash, config_hash, tool_hash, timestamp }. ~100 bytes per product, so ~500KB extra disk for a 5000-product project.
Reads only on --explain. classify_products already routes through explain_descriptor; extend that to look up the prior components row, recompute current components, diff the three, and return a richer reason like BUILD (config changed: cflags, include_paths).
Writes only when explicitly tracking. Two reasonable gates:
- Option A (single flag): --explain enables both write and read. CI runs without --explain → zero overhead. Trade-off: the first explain run after enabling has no prior row → reports “no prior state” generically. Subsequent runs work fully.
- Option B (separate --track-changes / [build] track_changes = true): decouples capture from query. CI omits the flag → zero overhead. Devs opt in permanently via config.
- Lean Option A: fewer flags, the existing --explain carries both ends of the lifecycle, and CI/CD pays nothing by default since neither flag is set.
Tier 1 only. Says “input bucket changed” but not which file. For a .cc file with 100 headers, the user still doesn’t know which header. A future Tier 2 (per-input-file checksums) would resolve that at ~5-10x storage cost; defer until users ask.
Caveats: adds a third source of truth (alongside descriptors and the in-memory graph) to keep in sync. Stale entries (products dropped from config) accumulate harmlessly until cache clear.
Urgency: medium | Complexity: medium

Caching & Performance

Deferred materialization

Don’t write cached outputs to disk until they’re actually needed by a downstream product.
Urgency: low | Complexity: high

Garbage collection policy

Time-based or size-based cache policies: “keep cache under 1GB” or “evict entries older than 30 days.”
Config: max_size = "1GB", max_age = "30d", gc_policy = "lru".
rsconstruct cache gc for manual garbage collection.
Urgency: low | Complexity: medium

Shared cache across branches

Surface in rsconstruct status when products are restorable from another branch.
Already works implicitly via input hash matching.
Urgency: low | Complexity: low

Merkle tree input hashing

Hash inputs as a Merkle tree rather than flat concatenation.
More efficient for large input sets — changing one file only rehashes its branch, not all inputs.
Also enables efficient transfer of input trees to remote execution workers.
Urgency: low | Complexity: medium

Reproducibility

Hermetic builds

Control all inputs beyond tool versions: isolate env vars, control timestamps, sandbox network, pin system libraries.
Config: hermetic = true, allowed_env = ["HOME", "PATH"].
Verification: rsconstruct build --verify builds twice and compares outputs.
Urgency: low | Complexity: high

Determinism verification

rsconstruct build --verify mode that builds each product twice and compares outputs.
Urgency: low | Complexity: medium

CI & Reporting

CI config generator

rsconstruct ci generate outputs a GitHub Actions or GitLab CI config that runs the build.
Detects enabled processors and required tools, generates install steps and build commands.
Supports --format=github|gitlab|circleci.
Urgency: medium | Complexity: medium

HTML build report

Generate a visual HTML dashboard of build times, cache hit rates, and processor statistics.
rsconstruct build --report=build.html or rsconstruct report.
Include charts for timing trends, per-processor breakdown, cache efficiency.
Urgency: low | Complexity: medium

PR comment bot

Post build results (pass/fail, timing, warnings) as a GitHub PR comment.
rsconstruct ci comment reads build output and posts via GitHub API.
Urgency: low | Complexity: medium

Content & Documentation

`rsconstruct init --detect`

rsconstruct smart auto already scans and enables processors, but a dedicated init --detect could go further.
Generate a complete rsconstruct.toml with processor-specific config (src_dirs, extensions, tool paths).
Urgency: medium | Complexity: medium

`rsconstruct fmt` — Auto-format rsconstruct.toml

Sort [processor.*] sections alphabetically, align values, remove redundant defaults.
Urgency: low | Complexity: low

Cross-project term sync

Automatically keep terms directories in sync across multiple repos.
Could run as a daemon or a periodic CI job.
rsconstruct terms sync --repos=repo1,repo2 or config-driven.
Urgency: low | Complexity: medium

Glossary generator

rsconstruct terms glossary generates a markdown glossary from the terms directory.
Optionally pulls definitions from context in the markdown files where terms are used.
Urgency: low | Complexity: medium

Link checker processor

Validate that URLs in markdown files are not broken (HTTP HEAD requests).
Configurable timeout, retry, and allow/blocklist patterns.
Cache results to avoid re-checking unchanged URLs.
Urgency: medium | Complexity: medium

Image optimizer processor

Compress and resize images referenced in markdown files.
Uses tools like optipng, jpegoptim, svgo.
Config: quality levels, max dimensions, output format.
Urgency: low | Complexity: medium

HTML+JS compression and packaging

Minify and bundle HTML, CSS, and JavaScript files for deployment.
Could use tools like terser (JS), csso (CSS), html-minifier (HTML).
Bundle multiple JS/CSS files into single outputs, generate source maps.
Integrate with existing eslint/stylelint processors for a full web frontend pipeline.
Urgency: medium | Complexity: medium

Processor Ecosystem

WASM processor plugins

Beyond Lua, allow processors written in any language compiled to WebAssembly.
Provides sandboxing, portability, and language flexibility.
WASI for filesystem access within the sandbox.
Urgency: low | Complexity: high

Processor marketplace / registry

A central repository of community-contributed processor configs and Lua plugins.
Install with rsconstruct plugin install prettier.
Registry as a GitHub repository with a JSON index. Version pinning in rsconstruct.toml.
Urgency: low | Complexity: high

Cleaning & Cache

Time-based cache purge

rsconstruct cache purge --older-than=7d to remove cache entries older than a given duration.
Currently only cache clear exists which removes everything.
Walk the object store, check file mtimes, remove old entries.
Urgency: medium | Complexity: low

Enhanced cache statistics

rsconstruct cache stats currently shows minimal info.
Add: hit rate percentage, bytes saved vs rebuild time, per-processor breakdown, slowest processors.
Helps users identify optimization opportunities.
Urgency: medium | Complexity: medium

Allow ${env:HOME} or ${env:CI} in rsconstruct.toml to reference environment variables.
The variable substitution system already exists for [vars]; extending it to env vars is natural.
Useful for CI/CD systems that pass secrets or paths via environment.
Urgency: medium | Complexity: low

Per-processor batch size

Each processor config has a batch boolean, but batch size is global ([build] batch_size).
Different tools have different startup costs — fast tools benefit from large batches, slow tools from small ones.
Add batch_size field to individual processor configs, overriding the global default.
Urgency: medium | Complexity: low

Processor Ecosystem

Flake8 (Python linter)

Many projects still use flake8 over ruff. Widely adopted.
Checker processor using flake8. Batch-capable.
Urgency: medium | Complexity: low

Security

Shell command execution from source file comments

EXTRA_*_SHELL directives execute arbitrary shell commands parsed from source file comments.
Document the security implications clearly.
Urgency: medium | Complexity: low

Internal Cleanups

These are code-quality items surfaced by an architecture audit. Each is localized; none block features. See architecture-observations.md for larger structural items.

Consolidate processor discovery helpers

src/processors/mod.rs exposes discover_checker_products, discover_directory_products, checker_discover, checker_auto_detect, checker_auto_detect_with_scan_root, scan_or_skip — all similar, with subtle differences (some auto-apply dep_auto, some don’t; some validate scan roots, some don’t).
Choosing the wrong helper is a silent correctness issue: a processor that picks discover_checker_products when it needed checker_discover loses dep_auto merging and never finds out.
Collapse to one or two helpers with explicit flags for the variations. Document the contract each helper commits to.
Urgency: medium | Complexity: low

Remove / complete `remote_pull` scaffold in `ObjectStore`

src/object_store/mod.rs has a remote_pull field and try_fetch_* helpers in operations.rs that nothing calls.
Either finish the feature (wire the fetch helpers into the classify path) or delete the scaffold. Unused public-ish surface rots.
Urgency: low | Complexity: medium (complete) / low (delete)

Drop or use `processor_type` on `ProcessorPlugin`

src/registries/processor.rs has processor_type marked #[allow(dead_code)] with a comment about a future processors list --type=checker filter.
Either ship the filter or drop the field until it’s needed. Dead fields with comments accumulate.
Urgency: low | Complexity: low

`TOOLS` registry is monolithic and unsorted

src/processors/mod.rs has ~170 entries in a static array mixing Python, Node, Ruby, Rust, Perl, System categories with no alphabetic ordering within groups.
Hard to find a tool when adding one; hard to audit for gaps (a tool with no install command makes doctor silently unhelpful).
Split per-runtime into separate files or sort alphabetically within a section. Add a unit test that every processor’s required_tools() entries have a matching TOOLS row (this test exists — keep it; make the table easier to satisfy).
Urgency: low | Complexity: low

Centralize alias expansion

expand_aliases in src/builder/build.rs handles @checkers / @generators / @toolname / bare-name syntaxes. It’s called once for -p and once for -x. Any new alias shortcut has to be added there.
No duplication today, but the function is in build.rs despite being useful elsewhere (completion, processors list, analyzers used). Move to a dedicated module and make it the canonical expander.
Urgency: low | Complexity: low

Inconsistent error-handling idioms in processors

Some processors use anyhow::bail!, some anyhow::Context::with_context(), some construct custom messages. The coding-standards doc already calls for with_context on every I/O operation, but processor-level error shape varies.
Pick one idiom per category (tool-failure vs. config-error vs. internal-error) and retrofit. Makes --json error events more uniform too.
Urgency: low | Complexity: low

Config validation timing

Unknown-field and must-field validation runs inside Config::load, which is correct. However, some cross-field validations (e.g. “cc_single_file needs include_paths if compiling C++”) happen later during processor creation or build.
Either pull all semantic validation into Config::load (so toml check catches everything) or accept that semantic errors surface later and document which is which.
Urgency: low | Complexity: medium

`products list` CLI

Users can run rsconstruct graph show (full graph) or rsconstruct status (per-processor summary), but there’s no flat list of “here are every product that would execute, with its primary input and output.”
Add rsconstruct products list (parallel to processors list and analyzers used). Respects -p/-x/--target filters.
Urgency: low | Complexity: low

`ProductTiming.start_offset` not populated for batch execution

src/processors/mod.rs defines start_offset on ProductTiming; it’s populated for non-batch execution but may be None for batch paths.
Trace visualizations (--trace) look jagged or incomplete when batches are involved.
Urgency: low | Complexity: low

Completed Suggestions

Items from suggestions.md that have been implemented.

Completed Features

Remote caching — See Remote Caching. Share build artifacts across machines via S3, HTTP, or filesystem.
Lua plugin system — See Lua Plugins. Define custom processors in Lua without forking rsconstruct.
Tool version locking — rsconstruct tools lock locks and verifies external tool versions. Tool versions are included in cache keys.
JSON output mode — --json flag for machine-readable JSON Lines output (build_start, product_start, product_complete, build_summary events).
Native C/C++ include scanner — Default include_scanner = "native" uses regex-based scanning. Falls back to include_scanner = "compiler" (gcc -MM).
--processors flag — rsconstruct build -p tera,ruff and rsconstruct watch -p tera filter which processors run.
Colored diff on config changes — When processor config changes trigger rebuilds, rsconstruct shows what changed with colored diff output.
Batch processing — ruff, pylint, shellcheck, zspell, mypy, and rumdl all support batch execution via execute_batch().
Progress bar — Uses indicatif crate. Progress bar sized to actual work (excludes instant skips), hidden in verbose/JSON mode.
Emit ProductStart JSON events — Emitted before each product starts executing, pairs with ProductComplete for per-product timing.
mypy processor — Python type checking with mypy. Batch-capable. Auto-detects mypy.ini as extra input.
Explain commands — --explain flag shows skip/restore/rebuild reasons for each product during build.

Completed Code Consolidation

Collapsed checker_config! macro variants — Merged @basic, @with_auto_inputs, and @with_linter into two internal variants (@no_linter and @with_linter).
Added batch field to all manually-defined processor configs — All processor configs now support batch = false to disable batching per-project.
Replaced trivial checker files with simple_checker! macro — 25 trivial checkers reduced from ~35 lines each to 3-5 lines (~800 lines eliminated).
Unified lint_files/check_files naming — All checkers now use check_files consistently.
Moved should_process guard into macro — Added guard: scan_root built-in to impl_checker!, removed boilerplate should_process() from 7 processors.
Simplified KnownFields — Scan config fields auto-appended by validation layer via SCAN_CONFIG_FIELDS constant; KnownFields impls only list their own fields.
Extracted WordManager for spellcheck/aspell — Shared word-file management (loading, collecting, flushing, execute/batch patterns) in word_manager.rs.

Completed New Processors

mypy — Python type checking using mypy. Batch-capable. Config: checker, args, dep_inputs, scan.
yamllint — Lint YAML files using yamllint. src/processors/checkers/yamllint.rs.
jsonlint — Validate JSON files for syntax errors. src/processors/checkers/jsonlint.rs.
taplo (toml-lint) — Validate TOML files using taplo. src/processors/checkers/taplo.rs.
markdownlint — Lint Markdown files for structural issues. Uses mdl or markdownlint-cli.
pandoc — Convert Markdown to other formats (PDF, HTML, EPUB). Generator processor.
jinja2 — Render Jinja2 templates (.j2) via Python jinja2 library. src/processors/generators/jinja2.rs.
black — Python formatting verification using black --check. src/processors/checkers/black.rs.
rust_single_file — Compile single-file Rust programs to executables. src/processors/generators/rust_single_file.rs.
sass — Compile SCSS/SASS files to CSS. src/processors/generators/sass.rs.
protobuf — Compile .proto files to generated code using protoc. src/processors/generators/protobuf.rs.
pytest — Run Python test files with pytest. src/processors/checkers/pytest.rs.
doctest — Run Python doctests via python3 -m doctest. src/processors/checkers/doctest.rs.

Completed Test Coverage

Ruff/pylint processor tests — tests/processors/ruff.rs and tests/processors/pylint.rs with integration tests.
Make processor tests — tests/processors/make.rs with Makefile discovery and execution tests.
All generator processor tests — Integration tests for all 14 previously untested generators: a2x, drawio, gem, libreoffice, markdown, marp, mermaid, npm, pandoc, pdflatex, pdfunite, pip, sphinx.
All checker processor tests — Integration tests for all 5 previously untested checkers: ascii, aspell, markdownlint, mdbook, mdl.

Completed Caching & Performance

Lazy file hashing (mtime-based) — mtime_check config (default true), fast_checksum() with MTIME_TABLE. Stores (path, mtime, checksum) tuples. Disable with --no-mtime.
Compressed cache objects — Optional zstd compression for .rsconstruct/objects/. Config: compression = true in [cache]. Incompatible with hardlink restore (must use restore_method = "copy"). Checksums computed on original content for stable cache keys.

Completed Developer Experience

--quiet flag — -q/--quiet suppresses all output except errors. Useful for CI scripts that only care about exit code.
Flaky product detection / retry — --retry=N retries failed products up to N times. Reports FLAKY (passed on retry) vs FAILED status in build summary.
Actionable error messages — rsconstruct tools check shows install hints for missing tools (e.g., “install with: pip install ruff”).
Build profiling / tracing — --trace=file.json generates Chrome trace format output viewable in chrome://tracing or Perfetto UI.
rsconstruct build <target> — Build specific targets by name or pattern via --target glob patterns and -d/--dir flags.
rsconstruct why <file> / Explain rebuilds — --explain flag shows why each product is skipped, restored, or rebuilt.
rsconstruct doctor — Diagnose build environment: checks config, tools, and versions. Full implementation in src/builder/doctor.rs.
rsconstruct sloc — Source lines of code statistics with COCOMO effort/cost estimation. src/builder/sloc.rs.

Completed Quick Wins

Batch processing for more processors — All checker processors that support multiple file arguments now use batching.
Progress bar for long builds — Implemented with indicatif, shows [elapsed] [bar] pos/len message.
--processors flag for build and watch — Filter processors with -p flag.
Emit ProductStart JSON events — Wired up and emitted before execution.
Colored diff on config changes — Shows colored JSON diff when processor config changes.

Completed Features (v0.3.7)

RSCONSTRUCT_THREADS env var — Set parallelism via environment variable instead of -j. Priority: CLI -j > RSCONSTRUCT_THREADS > config parallel.
Global output_dir in [build] — Global output directory prefix (default: "out"). Processor defaults like out/marp are remapped when the global is changed (e.g., output_dir = "build" makes marp output to build/marp). Individual processors can still override their output_dir explicitly.
Named processor instance output directories — When multiple instances of the same processor are declared (e.g., [processor.marp.slides] and [processor.marp.docs]), each instance defaults to out/{instance_name} (e.g., out/marp.slides, out/marp.docs) instead of sharing the same output directory.
Named processor instance names in error reporting — When multiple instances of the same processor exist, error messages, build progress, and statistics use the full instance name (e.g., [pylint.core], [pylint.tests]). Single instances continue to use just the processor type name.
processors config without config file — rsconstruct processors config <name> now works without an rsconstruct.toml, showing the default configuration (same as defconfig).
tags collect command — rsconstruct tags collect scans the tags database for tags that are not in the tag collection (tags_dir) and adds them to the appropriate .txt files. Key:value tags go to {key}.txt, bare tags go to tags.txt.
rsconstruct status shows 0-file processors — Processors declared in the config that match no files are now shown in status output and the --breakdown summary, making it easy to spot misconfigured or unnecessary processors.
smart remove-no-file-processors — New command rsconstruct smart remove-no-file-processors removes [processor.*] sections from rsconstruct.toml for processors that don’t match any files. Handles both single and named instances.
cc_single_file output_dir from config — The cc_single_file processor now reads its output directory from the config output_dir field instead of hardcoding out/cc_single_file. This fixes named instances (e.g., cc_single_file.gcc and cc_single_file.clang) which previously collided on the same output directory.
clean unknown respects .gitignore — rsconstruct clean unknown now skips gitignored files. Previously it disabled .gitignore handling, causing intentionally ignored files (IDE configs, virtualenvs, *.pyc, etc.) to be flagged as unknown. RSConstruct outputs are still correctly identified via the build graph, so nothing is missed. Use --no-gitignore to include gitignored files.
Cross-processor dependencies (fixed-point discovery) — Generator outputs are now visible to downstream processors on the first build. Discovery runs in a fixed-point loop: after each pass, declared outputs are injected as virtual files into the FileIndex, and discovery re-runs until no new products are found. This means a generator that creates .md files can feed pandoc/tags/spell-checkers in a single build, without needing a second build.

Completed Architecture Refactors

Config provenance tracking — Every config field now carries FieldProvenance (UserToml with line number, ProcessorDefault, ScanDefault, OutputDirDefault, SerdeDefault). rsconstruct config show annotates every field with its source. Uses toml_edit::Document for span capture.
BuildContext replacing process globals — All mutable process globals moved into BuildContext: the three processor globals (INTERRUPTED, RUNTIME, INTERRUPT_SENDER) and the three checksum globals (CACHE, MTIME_DB, MTIME_ENABLED). Threaded through the Processor trait, executor, analyzers, remote cache, checksum functions, and deps cache. Signal handler uses Arc<BuildContext>.
BuildPolicy trait — Extracted from the executor. classify_products delegates per-product skip/restore/rebuild decisions to a &dyn BuildPolicy. IncrementalPolicy implements the current logic. Future policies (dry-run, always-rebuild, time-windowed) are a single trait impl.
ObjectStore decomposition — mod.rs split from 664 → 223 lines into focused submodules: blobs.rs (content-addressed storage), descriptors.rs (cache descriptor CRUD), restore.rs (restore/needs_rebuild/can_restore/explain).

Completed Features (latest)

rsconstruct status --json — JSON output with per-processor counts (up_to_date, restorable, stale, new, total, native) and totals. Activated by --json flag.
Selective processor cleaning — rsconstruct clean outputs -p ruff,pylint cleans only those processors’ outputs. Without -p, cleans everything.
Prettier processor — Checker using prettier --check. Batch-capable. Scans .js/.jsx/.ts/.tsx/.mjs/.cjs/.css/.scss/.less/.html/.json/.md/.yaml/.yml. src/processors/checkers/prettier.rs.
Bare clean requires subcommand — rsconstruct clean now errors with usage hint instead of silently defaulting to clean outputs.
Nondeterministic test race fix — Fixed TOCTOU race in store_descriptor where parallel writers could get Permission denied. Now retries after forcing writable on first failure.
Suppress status line for non-build commands — The Exited with SUCCESS/ERROR footer only shows for build, watch, and clean.
Configurable graph validation — Four checks run after resolve_dependencies(): (1) reject empty inputs (default on), (2) validate dep references (default on), (3) detect duplicate inputs within same processor (default off), (4) early cycle detection (default off). Config: [graph] section fields validate_empty_inputs, validate_dep_references, validate_duplicate_inputs, validate_early_cycles.
Checksum globals moved to BuildContext — CACHE, MTIME_DB, MTIME_ENABLED moved from src/checksum.rs statics into BuildContext. combined_input_checksum, checksum_fast, file_checksum all take &BuildContext. Completes the isolated-build-context story.
rsconstruct fix command — Runs fixers (auto-format, auto-fix) on source files. Checkers declare fix capability via fix_subcommand/fix_prepend_args on SimpleCheckerParams. processors list shows a Fix column. Supports -p filtering, batch execution, and --json. Fix-capable processors: ruff, black, prettier, eslint, stylelint, standard, taplo, rumdl, markdownlint.
processors search — rsconstruct processors search <query> searches by name, description, and keywords. All 91 processors have keywords covering language, tool category, file extensions, and ecosystem terms. Supports --json output.

TODO

StandardConfig refactoring (DONE)

All config structs now embed StandardConfig via #[serde(flatten)].

Cache cleanup

Remove old DB cache code: CacheEntry, OutputEntry, get_entry, has_cache_entry, get_cached_input_checksum, CACHE_TABLE. These are legacy from the pre-descriptor system. has_cache_entry (used in status display to distinguish “stale” vs “new”) should use the descriptor system instead. ~80 lines of dead code.
Remove cache_key() method from Product. Only used by has_cache_entry and remove_stale. Once has_cache_entry is migrated to descriptors, it may become fully unused.
Split db.redb: the configs table (CONFIGS_TABLE) is still in the same DB as the now-unused cache table. Give configs its own file (configs.redb), then delete db.redb entirely.

Cache correctness

Implement output_depends_on_input_name flag. Documented in docs/src/cache.md but not implemented. Needed for processors that embed the input filename in their output (e.g., a // Generated from foo.c header). Without it, renaming such a file would produce a cache hit with stale content.
Write a test for identical content processed by different processors. Verify two different processors processing the same file get separate cache entries (the processor name is in the descriptor key).

Code consolidation

Inline single-use names constants. 20+ constants in processors::names are used in exactly one place each (their processor’s new() call). Inline them as string literals.
Clean processor_configs.rs. Still 2,100+ lines. Check for:
- ClangTidyConfig is nearly identical to StandardConfig — could it become a type alias?
- Unused default_* helper functions left over from cppcheck removal.
- Other config structs that are structurally identical to StandardConfig.

Documentation

Add docs/src/processors/creator.md — per-processor documentation for the Creator processor, like the other processor docs.

Housekeeping

Remove the tar lockfile entries. The crate was added and removed, but Cargo.lock may still reference it.
Commit everything. There is a large amount of uncommitted work spanning:
- HasScanConfig trait elimination
- SimpleGenerator (14 generators collapsed to data-driven)
- Creator processor (new processor type with multi-dir caching)
- Cache redesign (descriptor-based, content-addressed keys, no DB for cache data)
- Checksum cache centralization (moved mtime logic to checksum.rs with own DB)
- MassGenerator → Creator type rename
- ProcessorType enum with strum iteration
- processors types CLI command
- --no-mtime-cache CLI flag
- Mandatory supports_batch on all processors
- Checker consolidation (5 checkers → SimpleChecker entries)
- Removed unused dirs crate
- New documentation: cache.md, checksum-cache.md, processor-types.md

Keyboard shortcuts

RSConstruct - Rust Build Tool