RSConstruct - Rust Build Tool
A fast, incremental build tool written in Rust with C/C++ compilation, template support, Python linting, and parallel execution.
Features
- Incremental builds using SHA-256 checksums to detect changes
- C/C++ compilation with automatic header dependency tracking
- Parallel execution of independent build products with
-jflag - Template processing via the Tera templating engine
- Python linting with ruff and pylint
- Documentation spell checking using hunspell dictionaries
- Make integration — run make in directories containing Makefiles
.gitignoresupport — respects.gitignoreand.rsconstructignorepatterns- Deterministic builds — same input always produces same build order
- Graceful interrupt — Ctrl+C saves progress, next build resumes where it left off
- Config-aware caching — changing compiler flags or linter config triggers rebuilds
- Convention over configuration — simple naming conventions, minimal config needed
Philosophy
Convention over configuration — simple naming conventions, explicit config loading, incremental builds by default.
Nomenclature
This page defines the terminology used throughout RSConstruct’s code, configuration, CLI, and documentation.
Core concepts
| Term | Definition |
|---|---|
| pname | Processor name. The type name of a processor as registered by its plugin (e.g., ruff, pip, tera, creator). Unique across all plugins. Used in [processor.PNAME] config sections and in processors defconfig PNAME. |
| iname | Instance name. The name of a specific processor instance as declared in rsconstruct.toml. For single-instance processors, the iname equals the pname (e.g., [processor.ruff] → iname is ruff). For multi-instance processors, the iname is the sub-key (e.g., [processor.creator.venv] → iname is creator.venv). Used in processors config INAME. |
| processor | A configured instance that discovers products and executes builds. Created from a plugin + TOML config. Immutable after creation. |
| plugin | A factory registered at compile time via inventory::submit!. Knows how to create processors from TOML config. Has a pname, a processor type, and config metadata. |
| product | A single build unit with inputs, outputs, and a processor. The atomic unit of incremental building. |
| processor type | One of four categories: checker, generator, creator, explicit. Determines how inputs are discovered, how outputs are declared, and how results are cached. See Processor Types. |
| analyzer | A dependency scanner that runs after product discovery to add extra input edges to existing products (e.g., the cpp analyzer adds every #included header as an extra input of a C/C++ product). Analyzers never create products of their own. Declared with [analyzer.NAME] sections in rsconstruct.toml. Unlike processors, only analyzers explicitly declared in config run — there is no “auto-enable” default. See Dependency Analyzers. |
| analyzer plugin | A factory registered at compile time via inventory::submit! in the analyzer registry. Knows how to construct an analyzer from its [analyzer.NAME] TOML section. Each plugin declares its name, description, and whether it is native (pure Rust) or external (may invoke subprocesses). |
| native analyzer | An analyzer whose default configuration runs entirely in-process (no subprocesses). Example: icpp uses a pure-Rust regex scanner for #include directives. Some native analyzers become external in non-default configurations (e.g., icpp with pkg_config set shells out to pkg-config for include paths). |
| external analyzer | An analyzer that shells out to another program to do its work. Example: cpp always runs gcc -MM for exact compiler-accurate header scanning. |
Configuration
| Term | Definition |
|---|---|
| output_files | List of individual output files declared in creator/explicit config. Cached as blobs. |
| output_dirs | List of output directories declared in creator/explicit config. All files inside are walked and cached as a tree. |
| src_dirs | Directories to scan for input files. |
| src_extensions | File extensions to match during scanning. |
| dep_inputs | Extra files that trigger a rebuild when their content changes. |
| dep_auto | Config files silently added as dep_inputs when they exist on disk (e.g., .eslintrc). |
Cache
| Term | Definition |
|---|---|
| blob | A file’s raw content stored in the object store, addressed by SHA-256 hash. Blobs have no path — the consumer knows where to restore them. |
| tree | A serialized list of (path, mode, blob_checksum) entries describing a set of output files. Stored in the descriptor store. |
| marker | A zero-byte descriptor indicating a checker passed. Its presence is the cached result. |
| descriptor | A cache entry (blob reference, tree, or marker) stored in .rsconstruct/descriptors/, keyed by the descriptor key. |
| descriptor key | A content-addressed hash of (pname, config_hash, variant, input_checksum). Changes when processor config or input content changes. Does NOT include file paths — renaming a file with identical content produces the same key. |
| input checksum | Combined SHA-256 hash of all input file contents for a product. |
Build pipeline
| Term | Definition |
|---|---|
| discover | Phase where processors scan the file index and register products in the build graph. |
| classify | Phase where each product is classified as skip, restore, or build based on its cache state. |
| execute | Phase where products are built in dependency order. |
| anchor file | A file whose presence triggers a creator processor to run (e.g., Cargo.toml for cargo, requirements.txt for pip). |
CLI conventions
| Command | Name parameter | Meaning |
|---|---|---|
processors defconfig PNAME | pname | Processor type name — shows factory defaults |
processors config [INAME] | iname | Instance name from config — shows resolved config |
processors files [INAME] | iname | Instance name from config — shows discovered files |
analyzers defconfig [NAME] | analyzer name | Analyzer name from the analyzer registry — shows factory defaults |
analyzers config [NAME] | analyzer name | Analyzer name as declared in [analyzer.NAME] — shows resolved config |
Installation
Download pre-built binary (Linux)
Pre-built binaries are available for x86_64 and aarch64 (arm64).
Using the GitHub CLI:
# x86_64
gh release download latest --repo veltzer/rsconstruct --pattern 'rsconstruct-x86_64-unknown-linux-gnu' --output rsconstruct --clobber
# aarch64 / arm64
gh release download latest --repo veltzer/rsconstruct --pattern 'rsconstruct-aarch64-unknown-linux-gnu' --output rsconstruct --clobber
chmod +x rsconstruct
sudo mv rsconstruct /usr/local/bin/
Or with curl:
# x86_64
curl -Lo rsconstruct https://github.com/veltzer/rsconstruct/releases/download/latest/rsconstruct-x86_64-unknown-linux-gnu
# aarch64 / arm64
curl -Lo rsconstruct https://github.com/veltzer/rsconstruct/releases/download/latest/rsconstruct-aarch64-unknown-linux-gnu
chmod +x rsconstruct
sudo mv rsconstruct /usr/local/bin/
Install from crates.io
cargo install rsconstruct
This downloads, compiles, and installs the latest published version into ~/.cargo/bin/.
Build from source
cargo build --release
The binary will be at target/release/rsconstruct.
Release profile
The release build is configured in Cargo.toml for maximum performance with a small binary:
[profile.release]
strip = true # Remove debug symbols
lto = true # Link-time optimization across all crates
codegen-units = 1 # Single codegen unit for better optimization
For an even smaller binary at the cost of some runtime speed, add opt-level = "z" (optimize for size) or opt-level = "s" (balance size and speed).
Getting Started
This guide walks through setting up an rsconstruct project for the two primary supported languages: Python and C++.
Python
Prerequisites
- rsconstruct installed (Installation)
- ruff on PATH
Setup
Create a project directory and configuration:
mkdir myproject && cd myproject
# rsconstruct.toml
[processor.ruff]
Create a Python source file:
mkdir -p src
# src/hello.py
def greet(name: str) -> str:
return f"Hello, {name}!"
if __name__ == "__main__":
print(greet("world"))
Run the build:
rsconstruct build
Expected output:
Processing ruff (1 product)
hello.py
Run again — nothing has changed, so rsconstruct skips the check:
Processing ruff (1 product)
Up to date
Adding pylint
Install pylint and add a section for it:
# rsconstruct.toml
[processor.ruff]
[processor.pylint]
Pass extra arguments via processor config:
[processor.pylint]
args = ["--disable=C0114,C0115,C0116"]
Adding zspell for docs
If your project has markdown documentation, add a section for the zspell processor:
[processor.ruff]
[processor.pylint]
[processor.zspell]
Create a .zspell-words file in the project root with any custom words (one per line) that the zspeller should accept.
C++
Prerequisites
- rsconstruct installed (Installation)
- gcc/g++ on PATH
Setup
Create a project directory and configuration:
mkdir myproject && cd myproject
# rsconstruct.toml
[processor.cc_single_file]
Create a source file under src/:
mkdir -p src
// src/hello.c
#include <stdio.h>
int main() {
printf("Hello, world!\n");
return 0;
}
Run the build:
rsconstruct build
Expected output:
Processing cc_single_file (1 product)
hello.elf
The compiled executable is at out/cc_single_file/hello.elf.
Run again — the source hasn’t changed, so rsconstruct restores from cache:
Processing cc_single_file (1 product)
Up to date
Customizing compiler flags
Pass flags via processor config:
[processor.cc_single_file]
cflags = ["-Wall", "-Wextra", "-O2"]
cxxflags = ["-Wall", "-Wextra", "-O2"]
include_paths = ["include"]
See the CC Single File processor docs for the full configuration reference.
Adding static analysis
Install cppcheck and add a section for it:
[processor.cc_single_file]
[processor.cppcheck]
Both processors run on the same source files — rsconstruct handles them independently.
Next Steps
- Commands — full list of rsconstruct commands
- Configuration — all configuration options
- Processors — detailed docs for each processor
Binary Releases
RSConstruct publishes pre-built binaries as GitHub releases when a version tag
(v*) is pushed.
Supported Platforms
| Platform | Binary name |
|---|---|
| Linux x86_64 | rsconstruct-linux-x86_64 |
| Linux aarch64 (arm64) | rsconstruct-linux-aarch64 |
| macOS x86_64 | rsconstruct-macos-x86_64 |
| macOS aarch64 (Apple Silicon) | rsconstruct-macos-aarch64 |
| Windows x86_64 | rsconstruct-windows-x86_64.exe |
How It Works
The release workflow (.github/workflows/release.yml) has two jobs:
- build — a matrix job that builds the release binary for each platform and uploads it as a GitHub Actions artifact.
- release — waits for all builds to finish, downloads the artifacts, and creates a GitHub release with auto-generated release notes and all binaries attached.
Creating a Release
- Update
versioninCargo.toml - Commit and push
- Tag and push:
git tag v0.2.2 && git push origin v0.2.2 - The workflow creates the GitHub release automatically
Release Profile
The binary is optimized for size and performance:
[profile.release]
strip = true # Remove debug symbols
lto = true # Link-time optimization across all crates
codegen-units = 1 # Single codegen unit for better optimization
Command Reference
Global Flags
These flags can be used with any command:
| Flag | Description |
|---|---|
--verbose, -v | Show skip/restore/cache messages during build |
--output-display, -O | What to show for output files (none, basename, path; default: none) |
--input-display, -I | What to show for input files (none, source, all; default: source) |
--path-format, -P | Path format for displayed files (basename, path; default: path) |
--show-child-processes | Print each child process command before execution |
--show-output | Show tool output even on success (default: only show on failure) |
--json | Output in JSON Lines format (machine-readable) |
--quiet, -q | Suppress all output except errors (useful for CI) |
--phases | Show build phase messages (discover, add_dependencies, etc.) |
Example:
rsconstruct --phases build # Show phase messages during build
rsconstruct --show-child-processes build # Show each command being executed
rsconstruct --show-output build # Show compiler/linter output even on success
rsconstruct --phases --show-child-processes build # Show both phases and commands
rsconstruct -O path build # Show output file paths in build messages
rsconstruct -I all build # Show all input files (including headers)
rsconstruct build
Requires config. (no subcommands)
Incremental build — only rebuilds products whose inputs have changed.
rsconstruct build # Incremental build
rsconstruct build --force # Force full rebuild
rsconstruct build -j4 # Build with 4 parallel jobs
rsconstruct build --dry-run # Show what would be built without executing
rsconstruct build --keep-going # Continue after errors
rsconstruct build --timings # Show per-product and total timing info
rsconstruct build --stop-after discover # Stop after product discovery
rsconstruct build --stop-after add-dependencies # Stop after dependency scanning
rsconstruct build --stop-after resolve # Stop after graph resolution
rsconstruct build --stop-after classify # Stop after classifying products
rsconstruct build --show-output # Show compiler/linter output even on success
rsconstruct build --auto-add-words # Add misspelled words to .zspell-words instead of failing
rsconstruct build --auto-add-words -p zspell # Run only zspell and auto-add words
rsconstruct build -p ruff,pylint # Run only specific processors
rsconstruct build --explain # Show why each product is skipped/restored/rebuilt
rsconstruct build --retry 3 # Retry failed products up to 3 times
rsconstruct build --no-mtime # Disable mtime pre-check, always compute checksums
rsconstruct build --no-summary # Suppress the build summary
rsconstruct build --batch-size 10 # Limit batch size for batch-capable processors
rsconstruct build --verify-tool-versions # Verify tool versions against .tools.versions
rsconstruct build -t "src/*.c" # Only build products matching this glob pattern
rsconstruct build -d src # Only build products under this directory
rsconstruct build --show-all-config-changes # Show all config changes, not just output-affecting
By default, tool output (compiler messages, linter output) is only shown when a command fails. Use --show-output to see all output.
Incremental recovery and batch behavior
By default (fail-fast mode), rsconstruct executes each product independently, even for batch-capable processors. Successfully completed products are cached immediately, so if a build fails or is interrupted, the next run only rebuilds what wasn’t completed.
With --keep-going, batch-capable processors group all their products into a single tool invocation. If the tool fails, all products in the batch are marked failed and must be rebuilt. Use --batch-size N to limit batch chunks and improve recovery granularity.
Processor Shortcuts (@ aliases)
The -p flag supports @-prefixed shortcuts that expand to groups of processors:
By type:
@checkers— all checker processors (ruff, pylint, shellcheck, etc.)@generators— all generator processors (tera, cc_single_file, etc.)@creators— all creator processors (pip, npm, cargo, etc.)
By tool:
@python3— all processors that requirepython3@node— all processors that requirenode- Any tool name works (matched against each processor’s
required_tools())
By processor name:
@ruff— equivalent toruff(strips the@prefix)
Examples:
rsconstruct build -p @checkers # Run only checker processors
rsconstruct build -p @generators # Run only generator processors
rsconstruct build -p @python3 # Run all Python-based processors
rsconstruct build -p @checkers,tera # Mix shortcuts with processor names
The --stop-after flag allows stopping the build at a specific phase:
discover— stop after discovering products (before dependency scanning)add-dependencies— stop after adding dependencies (before resolving graph)resolve— stop after resolving the dependency graph (before execution)classify— stop after classifying products (show skip/restore/build counts)build— run the full build (default)
rsconstruct clean
Clean build artifacts. When run without a subcommand, removes build output files (same as rsconstruct clean outputs).
| Subcommand | Config required? |
|---|---|
outputs | Yes |
all | Yes |
git | Yes |
unknown | Yes |
rsconstruct clean # Remove build output files (preserves cache) [default]
rsconstruct clean outputs # Remove build output files (preserves cache)
rsconstruct clean all # Remove out/ and .rsconstruct/ directories
rsconstruct clean git # Hard clean using git clean -qffxd (requires git repository)
rsconstruct clean unknown # Remove files not tracked by git and not known as build outputs
rsconstruct clean unknown --dry-run # Show what would be removed without deleting
rsconstruct clean unknown --no-gitignore # Include gitignored files as unknown
rsconstruct status
Requires config. (no subcommands)
Show product status — whether each product is up-to-date, stale, or restorable from cache.
rsconstruct status # Show per-processor and total summary
rsconstruct status -v # Show per-product status
rsconstruct status --breakdown # Show source file counts by processor and extension
rsconstruct smart auto
Auto-detect relevant processors and add them to rsconstruct.toml. Scans the project for files matching each processor’s conventions and checks that the required tools are installed. Only adds new sections — existing processor sections are preserved. Requires config.
rsconstruct smart auto
Example output:
Added 3 processor(s): pylint, ruff, shellcheck
rsconstruct init
No config needed. (no subcommands)
Initialize a new rsconstruct project in the current directory.
rsconstruct init
rsconstruct watch
Requires config. (no subcommands)
Watch source files and auto-rebuild on changes.
rsconstruct watch # Watch and rebuild on changes
rsconstruct watch --auto-add-words # Watch with zspell auto-add words
rsconstruct watch -j4 # Watch with 4 parallel jobs
rsconstruct watch -p ruff # Watch and only run the ruff processor
The watch command accepts the same build flags as rsconstruct build (e.g., --jobs, --keep-going, --timings, --processors, --batch-size, --explain, --retry, --no-mtime, --no-summary).
rsconstruct graph
Display the build dependency graph.
| Subcommand | Config required? |
|---|---|
show | Yes |
view | Yes |
stats | Yes |
rsconstruct graph show # Default SVG format
rsconstruct graph show --format dot # Graphviz DOT format
rsconstruct graph show --format mermaid # Mermaid format
rsconstruct graph show --format json # JSON format
rsconstruct graph show --format text # Plain text hierarchical view
rsconstruct graph show --format svg # SVG format (requires Graphviz dot)
rsconstruct graph view # Open as SVG (default viewer)
rsconstruct graph view --viewer mermaid # Open as HTML with Mermaid in browser
rsconstruct graph view --viewer svg # Generate and open SVG using Graphviz dot
rsconstruct graph stats # Show graph statistics (products, processors, dependencies)
rsconstruct cache
Manage the build cache.
| Subcommand | Config required? |
|---|---|
clear | No |
size | Yes |
trim | Yes |
list | Yes |
stale | Yes |
stats | Yes |
remove-stale | Yes |
rsconstruct cache clear # Clear the entire cache
rsconstruct cache size # Show cache size
rsconstruct cache trim # Remove unreferenced objects
rsconstruct cache list # List all cache entries and their status
rsconstruct cache stale # Show which cache entries are stale vs current
rsconstruct cache stats # Show per-processor cache statistics
rsconstruct cache remove-stale # Remove stale index entries not matching any current product
rsconstruct webcache
Manage the web request cache. Schemas fetched by iyamlschema (and any future processors that fetch URLs) are cached in .rsconstruct/webcache.redb.
| Subcommand | Config required? |
|---|---|
clear | No |
stats | No |
list | No |
rsconstruct webcache clear # Clear all cached web responses
rsconstruct webcache stats # Show cache size and entry count
rsconstruct webcache list # List all cached URLs and their sizes
rsconstruct deps
Show or manage source file dependencies from the dependency cache. The cache is populated during builds when dependency analyzers scan source files (e.g., C/C++ headers, Python imports).
| Subcommand | Config required? |
|---|---|
list | No |
used | Yes |
build | Yes |
config | Yes |
show | Yes |
stats | Yes |
clean | Yes |
rsconstruct deps list # List all available dependency analyzers
rsconstruct deps build # Run dependency analysis without building
rsconstruct deps show all # Show all cached dependencies
rsconstruct deps show files src/main.c # Show dependencies for a specific file
rsconstruct deps show files src/a.c src/b.c # Show dependencies for multiple files
rsconstruct deps show analyzers cpp # Show dependencies from the C/C++ analyzer
rsconstruct deps show analyzers cpp python # Show dependencies from multiple analyzers
rsconstruct deps stats # Show statistics by analyzer
rsconstruct deps clean # Clear the entire dependency cache
rsconstruct deps clean --analyzer cpp # Clear only C/C++ dependencies
rsconstruct deps clean --analyzer python # Clear only Python dependencies
Example output for rsconstruct deps show all:
src/main.c: [cpp] (no dependencies)
src/test.c: [cpp]
src/utils.h
src/config.h
config/settings.py: [python]
config/base.py
Example output for rsconstruct deps stats:
cpp: 15 files, 42 dependencies
python: 8 files, 12 dependencies
Total: 23 files, 54 dependencies
Note: This command reads directly from the dependency cache (.rsconstruct/deps.redb). If the cache is empty, run a build first to populate it.
This command is useful for:
- Debugging why a file is being rebuilt
- Understanding the include/import structure of your project
- Verifying that dependency analyzers are finding the right files
- Viewing statistics about cached dependencies by analyzer
- Clearing dependencies for a specific analyzer without affecting others
rsconstruct smart
Smart config manipulation commands for managing processor sections in rsconstruct.toml.
| Subcommand | Config required? |
|---|---|
disable-all | No |
enable-all | No |
enable | No |
disable | No |
only | No |
reset | No |
enable-detected | Yes |
enable-if-available | Yes |
minimal | Yes |
auto | Yes |
remove-no-file-processors | Yes |
rsconstruct smart enable pylint # Add [processor.pylint] section
rsconstruct smart disable pylint # Remove [processor.pylint] section
rsconstruct smart enable-all # Add sections for all builtin processors
rsconstruct smart disable-all # Remove all processor sections
rsconstruct smart enable-detected # Add sections for auto-detected processors
rsconstruct smart enable-if-available # Add sections for detected processors with tools installed
rsconstruct smart minimal # Remove all, then add only detected processors
rsconstruct smart only ruff pylint # Remove all, then add only listed processors
rsconstruct smart reset # Remove all processor sections
rsconstruct smart remove-no-file-processors # Remove processors that don't match any files
rsconstruct processors
| Subcommand | Config required? |
|---|---|
list --all | No |
list | Yes (without --all) |
defconfig | No |
config | Uses config if available |
used | Yes |
files | Yes |
allowlist | Yes |
graph | Yes |
rsconstruct processors list # List declared processors and descriptions
rsconstruct processors list -a # Show all built-in processors
rsconstruct processors files # Show source and target files for each declared processor
rsconstruct processors files ruff # Show files for a specific processor
rsconstruct processors files # Show files for enabled processors
rsconstruct processors config ruff # Show resolved configuration for a processor
rsconstruct processors config --diff # Show only fields that differ from defaults
rsconstruct processors defconfig ruff # Show default configuration for a processor
rsconstruct processors add ruff # Append [processor.ruff] to rsconstruct.toml (fields pre-populated + comments)
rsconstruct processors add ruff --dry-run # Preview the snippet without writing
rsconstruct processors allowlist # Show the current processor allowlist
rsconstruct processors graph # Show inter-processor dependencies
rsconstruct processors graph --format dot # Graphviz DOT format
rsconstruct processors graph --format mermaid # Mermaid format
rsconstruct processors files --headers # Show files with processor headers
rsconstruct tools
List or check external tools required by declared processors. All subcommands use config if available; without config, they operate on all built-in processors.
| Subcommand | Config required? |
|---|---|
list | Uses config if available |
check | Uses config if available |
lock | Uses config if available |
install | Uses config if available |
install-deps | Uses config if available |
stats | Uses config if available |
graph | Uses config if available |
rsconstruct tools list # List required tools and which processor needs them
rsconstruct tools list -a # Include tools from disabled processors
rsconstruct tools check # Verify tool versions against .tools.versions lock file
rsconstruct tools lock # Lock tool versions to .tools.versions
rsconstruct tools install # Install all missing external tools
rsconstruct tools install ruff # Install a specific tool by name
rsconstruct tools install -y # Skip confirmation prompt
rsconstruct tools install-deps # Install declared [dependencies] (pip, npm, gem)
rsconstruct tools install-deps -y # Skip confirmation prompt
rsconstruct tools stats # Show tool availability and language runtime breakdown
rsconstruct tools stats --json # Show tool stats in JSON format
rsconstruct tools graph # Show tool-to-processor dependency graph (DOT format)
rsconstruct tools graph --format mermaid # Mermaid format
rsconstruct tools graph --view # Open tool graph in browser
rsconstruct tags
Search and query frontmatter tags from markdown files.
| Subcommand | Config required? |
|---|---|
list | Yes |
count | Yes |
tree | Yes |
stats | Yes |
files | Yes |
grep | Yes |
for-file | Yes |
frontmatter | Yes |
unused | Yes |
validate | Yes |
matrix | Yes |
coverage | Yes |
orphans | Yes |
check | Yes |
suggest | Yes |
merge | Yes |
collect | Yes |
rsconstruct tags list # List all unique tags
rsconstruct tags count # Show each tag with file count, sorted by frequency
rsconstruct tags tree # Show tags grouped by prefix/category
rsconstruct tags stats # Show statistics about the tags database
rsconstruct tags files docker # List files matching a tag (AND semantics)
rsconstruct tags files docker --or k8s # List files matching any tag (OR semantics)
rsconstruct tags files level:advanced # Match key:value tags
rsconstruct tags grep deploy # Search for tags containing a substring
rsconstruct tags grep deploy -i # Case-insensitive tag search
rsconstruct tags for-file src/main.md # List all tags for a specific file
rsconstruct tags frontmatter src/main.md # Show raw frontmatter for a file
rsconstruct tags validate # Validate tags against tags_dir allowlist
rsconstruct tags unused # List tags in tags_dir not used by any file
rsconstruct tags unused --strict # Exit with error if unused tags found (CI)
rsconstruct tags check # Run all tag validations without building
rsconstruct tags suggest src/main.md # Suggest tags for a file based on similarity
rsconstruct tags coverage # Show percentage of files with each tag category
rsconstruct tags matrix # Show coverage matrix of tag categories per file
rsconstruct tags orphans # Find markdown files with no tags
rsconstruct tags merge ../other/tags # Merge tags from another project
rsconstruct tags collect # Add missing tags from source files to tag collection
rsconstruct complete
Generate shell completions. No config needed when shell is specified as argument; uses config to read default shells if no argument given.
rsconstruct complete bash # Generate bash completions
rsconstruct complete zsh # Generate zsh completions
rsconstruct complete fish # Generate fish completions
rsconstruct terms
Manage term checking and fixing in markdown files.
| Subcommand | Config required? |
|---|---|
fix | Yes |
merge | Yes |
stats | Yes |
rsconstruct terms fix
Add backticks around terms from the terms directory that appear unquoted in markdown files.
rsconstruct terms fix
rsconstruct terms fix --remove-non-terms # also remove backticks from non-terms
rsconstruct terms merge
Merge terms from another project’s terms directory. Unions matching files and copies missing files in both directions.
rsconstruct terms merge ../other-project/terms
rsconstruct doctor
Requires config. (no subcommands)
Diagnose build environment — checks config, tools, and versions.
rsconstruct doctor
rsconstruct info
Show project information.
| Subcommand | Config required? |
|---|---|
source | Yes |
rsconstruct info source # Show source file counts by extension
rsconstruct sloc
No config needed. (no subcommands)
Count source lines of code (SLOC) by language, with optional COCOMO effort/cost estimation.
rsconstruct sloc # Show SLOC by language
rsconstruct sloc --cocomo # Include COCOMO effort/cost estimation
rsconstruct sloc --cocomo --salary 80000 # Custom annual salary for COCOMO
rsconstruct version
No config needed. (no subcommands)
Print version information.
rsconstruct version
Shell Completions
RSConstruct generates shell completion scripts that provide tab-completion for commands, subcommands, flags, and argument values.
Generating Completions
# Generate for the default shell (configured in rsconstruct.toml)
rsconstruct complete
# Generate for a specific shell
rsconstruct complete bash
rsconstruct complete zsh
rsconstruct complete fish
To install, source the output in your shell profile:
# Bash (~/.bashrc)
eval "$(rsconstruct complete bash)"
# Zsh (~/.zshrc)
eval "$(rsconstruct complete zsh)"
# Fish (~/.config/fish/config.fish)
rsconstruct complete fish | source
Configuration
The default shell(s) for rsconstruct complete (with no argument) are configured in rsconstruct.toml:
[completions]
shells = ["bash"]
What Gets Completed
Commands and subcommands
All top-level commands (build, processors, analyzers, config, etc.) and their subcommands complete automatically via clap.
Processor type names (pnames)
These commands complete with processor type names from the plugin registry (e.g., ruff, pylint, cc_single_file):
rsconstruct processors defconfig <TAB>rsconstruct build --processors <TAB>/rsconstruct build -p <TAB>rsconstruct watch --processors <TAB>/rsconstruct watch -p <TAB>
The list is drawn from the plugin registry at compile time.
Processor instance names (inames)
These commands complete with instance names declared in the current project’s rsconstruct.toml (e.g., pylint, pylint.tests, cc_single_file):
rsconstruct processors config <TAB>rsconstruct processors files <TAB>
Instance names are extracted from [processor.NAME] and [processor.NAME.SUBNAME] headings in rsconstruct.toml at tab-completion time. Requires a project config in the current directory. Bash only.
Analyzer names
These commands complete analyzer names (cpp, markdown, python, tera):
rsconstruct analyzers config <TAB>rsconstruct analyzers clean --analyzer <TAB>
Analyzer names are specified via clap’s value_parser attribute, so they work in all shells without post-processing.
Flags and options
All --flags and -f short flags complete in all shells via clap’s built-in generation.
Implementation
Completions are generated by clap_complete in src/cli.rs. Two mechanisms provide argument-value completions:
1. clap value_parser (preferred)
For arguments with a small, fixed set of values, use #[arg(value_parser = [...])] on the field. This works in all shells automatically because clap embeds the values in the generated script.
Example from AnalyzersAction::Config:
#![allow(unused)]
fn main() {
#[arg(value_parser = ["cpp", "markdown", "python", "tera"])]
name: Option<String>,
}
2. Bash post-processing (processor names)
Processor names are not known to clap at derive time because they come from the inventory plugin registry. The function inject_bash_processor_completions() post-processes the generated bash script to inject processor names into the opts variable for specific command sections.
This only works for bash. Other shells get the base clap completions without processor name injection.
The targets for injection are identified by their case labels in the generated bash script:
rsconstruct__processors__config)rsconstruct__processors__defconfig)rsconstruct__processors__files)
The function also patches --processors / -p flag completions in build and watch commands to suggest processor names instead of file paths.
Adding Completions for New Arguments
- Fixed set of values (analyzer names, enum variants): Use
#[arg(value_parser = [...])]. Works in all shells. - Dynamic set from registry (processor names): Add the case label to
inject_bash_processor_completions()targets. Only works in bash. - Enum types: Use
#[arg(value_enum)]on a clap-derived enum. Works in all shells.
Configuration
RSConstruct is configured via an rsconstruct.toml file in the project root.
Full reference
[build]
parallel = 1 # Number of parallel jobs (1 = sequential, 0 = auto-detect CPU cores)
# Also settable via RSCONSTRUCT_THREADS env var (CLI -j takes precedence)
batch_size = 0 # Max files per batch for batch-capable processors (0 = no limit, omit to disable)
output_dir = "out" # Global output directory prefix for generator processors
# Declare processors by adding [processor.NAME] sections.
# Only declared processors run — no processors are enabled by default.
# Use `rsconstruct smart auto` to auto-detect and add relevant processors.
[processor.ruff]
# args = []
[processor.pylint]
# args = ["--disable=C0114"]
[processor.cc_single_file]
# cc = "gcc"
# cflags = ["-Wall", "-O2"]
[vars]
my_excludes = ["/vendor/", "/third_party/"] # Define variables for reuse with ${var_name}
[cache]
restore_method = "auto" # auto (default: copy in CI, hardlink otherwise), hardlink, or copy
compression = false # Compress cached objects with zstd (requires restore_method = "copy")
remote = "s3://my-bucket/rsconstruct-cache" # Optional: remote cache URL
remote_push = true # Push local builds to remote (default: true)
remote_pull = true # Pull from remote cache on cache miss (default: true)
mtime_check = true # Use mtime pre-check to skip unchanged file checksums (default: true)
[analyzer]
auto_detect = true
enabled = ["cpp", "python"]
[graph]
viewer = "google-chrome" # Command to open graph files (default: platform-specific)
[plugins]
dir = "plugins" # Directory containing .lua processor plugins
[completions]
shells = ["bash"]
[dependencies]
pip = ["pyyaml", "jinja2"] # Python packages
npm = ["eslint", "prettier"] # Node.js packages
gem = ["mdl"] # Ruby gems
system = ["pandoc", "graphviz"] # System packages (checked but not auto-installed)
Per-processor configuration is documented on each processor’s page under Processors. Lua plugin configuration is documented under Lua Plugins.
Processor instances
Processors are declared by adding a [processor.NAME] section to rsconstruct.toml. An empty section enables the processor with default settings:
[processor.pylint]
Customize with config fields:
[processor.pylint]
args = ["--disable=C0114,C0116"]
src_dirs = ["src"]
Remove the section to disable the processor.
Multiple instances
Run the same processor multiple times with different configurations by adding named sub-sections:
[processor.pylint.core]
src_dirs = ["src/core"]
args = ["--disable=C0114"]
[processor.pylint.tests]
src_dirs = ["tests"]
args = ["--disable=C0114,C0116"]
Each instance runs independently with its own config and cache.
You cannot mix single-instance and multi-instance formats for the same processor type — use either [processor.pylint] or [processor.pylint.NAME], not both.
Instance naming
A single instance declared as [processor.pylint] has the instance name pylint. Named instances declared as [processor.pylint.core] and [processor.pylint.tests] have instance names pylint.core and pylint.tests.
The instance name is used everywhere a processor is identified:
- Build output and progress:
[pylint.core] src/core/main.py - Error messages:
Error: [pylint.tests] tests/test_foo.py: ... - Build statistics: each instance reports its own file counts and durations
- Cache keys: instances have separate caches, so changing one config does not invalidate the other
- Output directories: generator processors default to
out/{instance_name}(e.g.,out/marp.slidesandout/marp.docsfor two marp instances), ensuring outputs do not collide - The
--processorsfilter: use the full instance name, e.g.,rsconstruct build -p pylint.core
For single instances, the instance name equals the processor type name (e.g., pylint), so there is no visible difference from previous behavior.
Auto-detection
Run rsconstruct smart auto to scan the project and automatically add [processor.NAME] sections for all processors whose files are detected and whose tools are installed. It does not remove existing sections.
Variable substitution
Define variables in a [vars] section and reference them using ${var_name} syntax:
[vars]
kernel_excludes = ["/kernel/", "/kernel_standalone/", "/examples_standalone/"]
[processor.cppcheck]
src_exclude_dirs = "${kernel_excludes}"
[processor.cc_single_file]
src_exclude_dirs = "${kernel_excludes}"
Variables are substituted before TOML parsing. The "${var_name}" (including quotes) is replaced with the TOML-serialized value, preserving types (arrays stay arrays, strings stay strings). Undefined variable references produce an error.
Section details
[build]
| Key | Type | Default | Description |
|---|---|---|---|
parallel | integer | 1 | Number of parallel jobs. 1 = sequential, 0 = auto-detect CPU cores. Can also be set via the RSCONSTRUCT_THREADS environment variable (CLI -j takes precedence). |
batch_size | integer | 0 | Maximum files per batch for batch-capable processors. 0 = no limit (all files in one batch). Omit to disable batching entirely. |
output_dir | string | "out" | Global output directory prefix. Processor output_dir defaults that start with out/ are remapped to use this prefix (e.g., setting "build" changes out/marp to build/marp). Individual processors can still override their output_dir explicitly. |
[processor.NAME]
Each [processor.NAME] section declares a processor instance. The section name must match a builtin processor type (e.g., ruff, pylint, cc_single_file) or a Lua plugin name.
Common fields available to all processors:
| Key | Type | Default | Description |
|---|---|---|---|
args | array of strings | [] | Extra command-line arguments passed to the tool. |
dep_inputs | array of strings | [] | Additional input files that trigger rebuild when changed. |
dep_auto | array of strings | varies | Config files auto-detected as inputs (e.g., .pylintrc). |
batch | boolean | true | Whether to batch multiple files into a single tool invocation. Note: in fail-fast mode (default), chunk size is 1 regardless of this setting — batch mode only groups files with --keep-going or --batch-size. For external tools, a batch failure marks all products in the chunk as failed. Internal processors (i-prefixed) return per-file results, so partial failure is handled correctly. |
max_jobs | integer | none | Maximum concurrent jobs for this processor. When set, limits how many instances of this processor run in parallel, regardless of the global -j setting. Useful for heavyweight processors (e.g., marp spawns Chromium). Omit to use the global parallelism. |
src_dirs | array of strings | varies | Directories to scan for source files. Required for most processors (defaults to []). Processors with a specific default (e.g., tera defaults to "tera.templates", cc_single_file defaults to "src") do not require this. Not required when src_files is set. Use rsconstruct processors defconfig <name> to see a processor’s defaults. |
src_extensions | array of strings | varies | File extensions to match. |
src_exclude_dirs | array of strings | varies | Directory path segments to exclude from scanning. |
src_exclude_files | array of strings | [] | File names to exclude. |
src_exclude_paths | array of strings | [] | Paths (relative to project root) to exclude. |
src_files | array of strings | [] | When non-empty, only these exact paths are matched — src_dirs, src_extensions, and exclude filters are bypassed. Useful for processors that operate on specific files rather than scanning directories. |
Processor-specific fields are documented on each processor’s page under Processors.
[cache]
| Key | Type | Default | Description |
|---|---|---|---|
restore_method | string | "auto" | How to restore cached outputs. "auto" (default) uses "copy" in CI environments (CI=true) and "hardlink" otherwise. "hardlink" is faster but requires same filesystem; "copy" works everywhere. |
compression | boolean | false | Compress cached objects with zstd. Incompatible with restore_method = "hardlink" — requires "copy". |
remote | string | none | Remote cache URL. See Remote Caching. |
remote_push | boolean | true | Push locally built artifacts to remote cache. |
remote_pull | boolean | true | Pull from remote cache on local cache miss. |
mtime_check | boolean | true | Persist file checksums across builds using an mtime database. Set to false in CI/CD environments where the cache won’t survive the build and the write overhead isn’t worth it. Can also be disabled via --no-mtime-cache flag. See Checksum Cache. |
[analyzer]
| Key | Type | Default | Description |
|---|---|---|---|
auto_detect | boolean | true | When true, only run enabled analyzers that auto-detect relevant files. |
enabled | array of strings | ["cpp", "python"] | List of dependency analyzers to enable. |
[graph]
| Key | Type | Default | Description |
|---|---|---|---|
viewer | string | platform-specific | Command to open graph files |
[plugins]
| Key | Type | Default | Description |
|---|---|---|---|
dir | string | "plugins" | Directory containing .lua processor plugins |
[completions]
| Key | Type | Default | Description |
|---|---|---|---|
shells | array | ["bash"] | Shells to generate completions for |
[dependencies]
Declare project dependencies by package manager. Used by rsconstruct doctor to verify availability and rsconstruct tools install-deps to install missing packages.
| Key | Type | Default | Description |
|---|---|---|---|
pip | array of strings | [] | Python packages to install via pip install. Supports version specifiers (e.g., "ruff>=0.4"). |
npm | array of strings | [] | Node.js packages to install via npm install. |
gem | array of strings | [] | Ruby gems to install via gem install. |
system | array of strings | [] | System packages installed via the detected package manager (apt-get, dnf, pacman, or brew). |
Remote Caching
RSConstruct supports sharing build artifacts across machines via remote caching. When enabled, build outputs are pushed to a remote store and can be pulled by other machines, avoiding redundant rebuilds.
Configuration
Add a remote URL to your [cache] section in rsconstruct.toml:
[cache]
remote = "s3://my-bucket/rsconstruct-cache"
Supported Backends
Amazon S3
[cache]
remote = "s3://bucket-name/optional/prefix"
Requires:
- AWS CLI installed (
awscommand) - AWS credentials configured (
~/.aws/credentialsor environment variables)
The S3 backend uses aws s3 cp and aws s3 ls commands.
HTTP/HTTPS
[cache]
remote = "http://cache-server.example.com:8080/rsconstruct"
# or
remote = "https://cache-server.example.com/rsconstruct"
Requires:
curlcommand- Server that supports GET and PUT requests
The HTTP backend expects:
GET /pathto return the object or 404PUT /pathto store the objectHEAD /pathto check existence (returns 200 or 404)
Local Filesystem
[cache]
remote = "file:///shared/cache/rsconstruct"
Useful for:
- Network-mounted filesystems (NFS, CIFS)
- Testing remote cache behavior locally
Control Options
You can control push and pull separately:
[cache]
remote = "s3://my-bucket/rsconstruct-cache"
remote_push = true # Push local builds to remote (default: true)
remote_pull = true # Pull from remote on cache miss (default: true)
Pull-only mode
To share a read-only cache (e.g., from CI):
[cache]
remote = "s3://ci-cache/rsconstruct"
remote_push = false
remote_pull = true
Push-only mode
To populate a cache without using it (e.g., in CI):
[cache]
remote = "s3://ci-cache/rsconstruct"
remote_push = true
remote_pull = false
How It Works
Cache Structure
Remote cache stores two types of objects:
-
Index entries at
index/{cache_key}- JSON mapping input checksums to output checksums
- One entry per product (source file + processor + config)
-
Objects at
objects/{xx}/{rest_of_checksum}- Content-addressed storage (like git)
- Actual file contents identified by SHA-256
On Build
- RSConstruct computes the cache key and input checksum
- Checks local cache first
- If local miss and
remote_pull = true:- Fetches index entry from remote
- Fetches required objects from remote
- Restores outputs locally
- If rebuild required:
- Executes the processor
- Stores outputs in local cache
- If
remote_push = true, pushes to remote
Cache Hit Flow
Local cache hit → Restore from local → Done
↓ miss
Remote cache hit → Download index + objects → Restore → Done
↓ miss
Execute processor → Cache locally → Push to remote → Done
Best Practices
CI/CD Integration
In your CI pipeline:
# .github/workflows/build.yml
env:
AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
steps:
- run: rsconstruct build
Separate CI and Developer Caches
Use different prefixes to avoid conflicts:
# CI: rsconstruct.toml.ci
[cache]
remote = "s3://cache/rsconstruct/ci"
remote_push = true
remote_pull = true
# Developers: rsconstruct.toml
[cache]
remote = "s3://cache/rsconstruct/ci"
remote_push = false # Read from CI cache only
remote_pull = true
Cache Invalidation
Cache entries are keyed by:
- Processor name
- Source file path
- Processor configuration hash
To force a full rebuild ignoring caches:
rsconstruct build --force
To clear only the local cache:
rsconstruct cache clear
Troubleshooting
S3 Access Denied
Check your AWS credentials:
aws s3 ls s3://your-bucket/
HTTP Upload Failures
Ensure your server accepts PUT requests. Many static file servers are read-only.
Slow Remote Cache
Consider:
- Using a closer region for S3
- Enabling S3 Transfer Acceleration
- Using a caching proxy
Debug Mode
Use verbose output to see cache operations:
rsconstruct build -v
This shows which products are restored from local cache, remote cache, or rebuilt.
Project Structure
RSConstruct follows a convention-over-configuration approach. The directory layout determines how files are processed.
Directory layout
project/
├── rsconstruct.toml # Configuration file
├── .rsconstructignore # Glob patterns for files to exclude
├── config/ # Python config files (loaded by templates)
├── tera.templates/ # .tera template files
├── templates.mako/ # .mako template files
├── src/ # C/C++ source files
├── plugins/ # Lua processor plugins (.lua files)
├── out/
│ ├── cc_single_file/ # Compiled executables
│ ├── ruff/ # Ruff lint stub files
│ ├── pylint/ # Pylint lint stub files
│ ├── cppcheck/ # C/C++ lint stub files
│ ├── zspell/ # Zspell stub files
│ └── make/ # Make stub files
└── .rsconstruct/ # Cache directory
├── index.json # Cache index
├── objects/ # Cached build artifacts
└── deps/ # Dependency files
Conventions
Templates
Files in tera.templates/ with configured extensions (default .tera) are rendered to the project root:
tera.templates/Makefile.teraproducesMakefiletera.templates/config.toml.teraproducesconfig.toml
Similarly, files in templates.mako/ with .mako extensions are rendered via the Mako processor:
templates.mako/Makefile.makoproducesMakefiletemplates.mako/config.toml.makoproducesconfig.toml
C/C++ sources
Files in the source directory (default src/) are compiled to executables under out/cc_single_file/, preserving the directory structure:
src/main.cproducesout/cc_single_file/main.elfsrc/utils/helper.ccproducesout/cc_single_file/utils/helper.elf
Python files
Python files are linted and stub outputs are written to out/ruff/ (ruff processor) or out/pylint/ (pylint processor).
Build artifacts
All build outputs go into out/. The cache lives in .rsconstruct/. Use rsconstruct clean to remove out/ (preserving cache) or rsconstruct clean all to remove both.
Dependency Analyzers
rsconstruct uses dependency analyzers to scan source files and discover dependencies between files. Analyzers run after processors discover products and add dependency information to the build graph.
How analyzers work
- Product discovery: Processors discover products (source → output mappings).
- Dependency analysis: Analyzers scan source files to find dependencies.
- Graph resolution: Dependencies are added to products for correct build ordering.
Analyzers are decoupled from processors — they operate on any product with matching source files, regardless of which processor created it.
Built-in analyzers
Per-analyzer reference pages:
- cpp — C/C++
#includescanning (invokesgcc/pkg-config) - icpp — C/C++
#includescanning, pure Rust (no subprocess) - python — Python
import/from ... importresolution - markdown — Markdown image and link references
- tera — Tera
{% include %},{% import %},{% extends %}references
Configuration
Analyzers are configured in rsconstruct.toml:
[analyzer]
auto_detect = true # default: true
enabled = ["cpp", "markdown", "python", "tera"] # instances to run
[analyzer.cpp]
include_paths = ["include", "src"]
Only analyzers listed under [analyzer.X] (or enabled) are instantiated — there is no global “all analyzers always run” mode.
Auto-detection
An analyzer runs if:
- It is declared (listed in
enabledor configured via[analyzer.X]). - AND either
auto_detect = false, OR the analyzer detects relevant files in the project.
This mirrors how processors work.
Caching
Analyzer results are cached in the dependency cache (.rsconstruct/deps.redb). On subsequent builds:
- If a source file hasn’t changed, its cached dependencies are used.
- If a source file has changed, dependencies are re-scanned.
- The cache is shared across all analyzers.
Use the analyzers and deps commands to inspect the cache:
rsconstruct analyzers list # list available analyzers
rsconstruct analyzers defconfig cpp # show default config for an analyzer
rsconstruct analyzers add cpp # append [analyzer.cpp] to rsconstruct.toml with comments
rsconstruct analyzers add cpp --dry-run # preview without writing
rsconstruct deps all # show all cached dependencies
rsconstruct deps for src/main.c # show dependencies for specific files
rsconstruct deps clean # clear the dependency cache
Build phases
With --phases, you can see when analyzers run:
rsconstruct --phases build
Output:
Phase: Building dependency graph...
Phase: discover
Phase: add_dependencies # Analyzers run here
Phase: apply_tool_version_hashes
Phase: resolve_dependencies
Use --stop-after add-dependencies to stop after dependency analysis:
rsconstruct build --stop-after add-dependencies
Adding a custom analyzer
Analyzers implement the DepAnalyzer trait:
#![allow(unused)]
fn main() {
pub trait DepAnalyzer: Sync + Send {
fn description(&self) -> &str;
fn auto_detect(&self, file_index: &FileIndex) -> bool;
fn analyze(
&self,
graph: &mut BuildGraph,
deps_cache: &mut DepsCache,
file_index: &FileIndex,
verbose: bool,
) -> Result<()>;
}
}
The analyze method should:
- Find products with relevant source files.
- Scan each source file for dependencies (using the cache when available).
- Add discovered dependencies to the product’s inputs.
cpp
Scans C/C++ source files for #include directives and adds header file dependencies to the build graph.
Native: No (may invoke gcc, pkg-config).
Auto-detects: Projects with .c, .cc, .cpp, .cxx, .h, .hh, .hpp, or .hxx files.
Features
- Recursive header scanning (follows includes in header files)
- Queries compiler for system include paths (only tracks project-local headers)
- Handles both
#include "file"(relative to source) and#include <file>(searches include paths) - Supports native regex scanning and compiler-based scanning (
gcc -MM) - Uses the dependency cache for incremental builds
System header detection
The cpp analyzer queries the compiler for its include search paths using gcc -E -Wp,-v -xc /dev/null. This allows it to properly identify which headers are system headers vs project-local headers. Only headers within the project directory are tracked as dependencies.
Configuration
[analyzer.cpp]
include_scanner = "native" # or "compiler" for gcc -MM
include_paths = ["include", "src"]
pkg_config = ["gtk+-3.0", "libcurl"]
include_path_commands = ["gcc -print-file-name=plugin"]
src_exclude_dirs = ["/kernel/", "/vendor/"]
cc = "gcc"
cxx = "g++"
cflags = ["-I/usr/local/include"]
cxxflags = ["-std=c++17"]
include_path_commands
Shell commands whose stdout (trimmed) is added to the include search paths. Useful for compiler-specific include directories:
[analyzer.cpp]
include_path_commands = [
"gcc -print-file-name=plugin", # GCC plugin development headers
"llvm-config --includedir", # LLVM headers
]
pkg_config integration
Runs pkg-config --cflags-only-I for each package and adds the resulting include paths to the search path. Useful when your code includes headers from system libraries:
[analyzer.cpp]
pkg_config = ["gtk+-3.0", "glib-2.0"]
This automatically finds headers like <gtk/gtk.h> and <glib.h> without manually specifying their include paths.
See also
- icpp — native (no-subprocess) C/C++ dependency analyzer
icpp
Native (no-subprocess) C/C++ dependency analyzer. Scans #include directives by parsing source files directly in Rust, without invoking gcc or pkg-config.
Native: Yes.
Auto-detects: Projects with .c, .cc, .cpp, .cxx, .h, .hh, .hpp, or .hxx files.
When to use
- You want faster analysis without the overhead of launching
gccper file. - You don’t need compiler-driven include path discovery.
- You’re happy to enumerate include paths explicitly in
rsconstruct.toml.
Prefer cpp if you need compiler-discovered system include paths or pkg-config integration.
Configuration
[analyzer.icpp]
include_paths = ["include", "src"]
src_exclude_dirs = ["/kernel/", "/vendor/"]
follow_angle_brackets = false
skip_not_found = false
follow_angle_brackets (default: false)
Controls whether #include <foo.h> directives are followed.
false(default) — angle-bracket includes are skipped entirely. System headers never enter the dependency graph, even when they resolve through configured include paths.true— angle-bracket includes are resolved and followed the same way as quoted includes. Unresolved angles are still tolerated (not an error), so missing system headers don’t break analysis.
Quoted includes (#include "foo.h") always resolve and must be found — this setting does not affect them (see skip_not_found below).
skip_not_found (default: false)
Controls what happens when an include cannot be resolved.
false(default) — a quoted include (#include "foo.h") that cannot be resolved is a hard error. Unresolved angle-bracket includes are silently ignored (whenfollow_angle_brackets = true).true— unresolved includes of any kind are silently skipped.
Use true for partial / work-in-progress codebases where some headers aren’t generated yet.
See also
- cpp — compiler-aware (external) C/C++ dependency analyzer
python
Scans Python source files for import and from ... import statements and adds dependencies on local Python modules.
Native: Yes.
Auto-detects: Projects with .py files.
Features
- Resolves imports to local files (ignores stdlib / external packages)
- Supports both
import fooandfrom foo import barsyntax - Searches relative to the source file and project root
Configuration
[analyzer.python]
# currently no tunables
markdown
Scans Markdown source files for image and link references (, [text](path)) and adds referenced local files as dependencies.
Native: Yes.
Auto-detects: Projects with .md files.
Features
- Extracts
image references and[text](path)link references - Resolves paths relative to the source file’s directory
- Skips URLs (
http://,https://,ftp://), data URIs, and anchor-only links - Strips title text and anchor fragments from paths
This ensures that when an image or linked file changes, any Markdown product that references it is rebuilt.
Configuration
[analyzer.markdown]
# currently no tunables
tera
Scans Tera template files for {% include %}, {% import %}, and {% extends %} directives and adds referenced template files as dependencies.
Native: Yes.
Auto-detects: Projects with .tera files.
Features
- Extracts paths from
{% include "path" %},{% import "path" %}, and{% extends "path" %} - Handles both double- and single-quoted paths
- Resolves paths relative to the source file’s directory and the project root
This ensures that when an included template changes, any template that includes it is rebuilt.
Configuration
[analyzer.tera]
# currently no tunables
Processors
RSConstruct uses processors to discover and build products. Each processor scans for source files matching its conventions and produces output files.
Processor Types
There are four processor types: checker, generator, creator, and explicit. They differ in how inputs are discovered, how outputs are declared, and how results are cached.
See Processor Types for full descriptions, examples, and a comparison table.
Configuration
Declare processors by adding [processor.NAME] sections to rsconstruct.toml:
[processor.ruff]
[processor.pylint]
args = ["--disable=C0114"]
[processor.cc_single_file]
Only declared processors run — no processors are enabled by default. Use rsconstruct smart auto to auto-detect and add relevant processors.
Use rsconstruct processors list to see declared processors and descriptions.
Use rsconstruct processors list --all to show all built-in processors, not just those enabled in the project.
Use rsconstruct processors files to see which files each processor discovers.
Available Processors
- Tera — renders Tera templates into output files
- Ruff — lints Python files with ruff
- Pylint — lints Python files with pylint
- Mypy — type-checks Python files with mypy
- Pyrefly — type-checks Python files with pyrefly
- CC — builds full C/C++ projects from cc.yaml manifests
- CC Single File — compiles C/C++ source files into executables (single-file)
- Linux Module — builds Linux kernel modules from linux-module.yaml manifests
- Cppcheck — runs static analysis on C/C++ source files
- Clang-Tidy — runs clang-tidy static analysis on C/C++ source files
- Shellcheck — lints shell scripts using shellcheck
- Zspell — checks documentation files for spelling errors
- Rumdl — lints Markdown files with rumdl
- Make — runs make in directories containing Makefiles
- Cargo — builds Rust projects using Cargo
- Yamllint — lints YAML files with yamllint
- Jq — validates JSON files with jq
- Jsonlint — lints JSON files with jsonlint
- Taplo — checks TOML files with taplo
- Terms — checks that technical terms are backtick-quoted in Markdown files
- Json Schema — validates JSON schema propertyOrdering
- Iyamlschema — validates YAML files against JSON schemas (native)
- Yaml2json — converts YAML files to JSON (native)
- Markdown2html — converts Markdown to HTML using markdown CLI
- Imarkdown2html — converts Markdown to HTML (native)
Output Directory Caching
Creator processors (cargo, sphinx, mdbook, pip, npm, gem, and user-defined creators) produce output in directories rather than individual files. RSConstruct caches these entire directories so that after rsconstruct clean && rsconstruct build, the output is restored from cache instead of being regenerated.
After a successful build, RSConstruct walks the output directories, stores every file as a content-addressed blob, and records a tree (manifest of paths, checksums, and Unix permissions). On restore, the entire directory tree is recreated from cached blobs with permissions preserved. See Cache System for details.
For user-defined creators, output directories are declared via output_dirs:
[processor.creator.venv]
command = "pip"
args = ["install", "-r", "requirements.txt"]
src_extensions = ["requirements.txt"]
output_dirs = [".venv"]
For built-in creators, this is controlled by the cache_output_dir config option (default true):
[processor.cargo]
cache_output_dir = false # Disable for large target/ directories
Custom Processors
You can define custom processors in Lua. See Lua Plugins for details.
Processor Types
Every processor in RSConstruct has a type that determines how it discovers inputs, produces outputs, and interacts with the cache. There are four types.
Run rsconstruct processors types to list them.
Checker
A checker validates input files without producing any output. If the check passes, the result is cached — if the inputs haven’t changed on the next build, the check is skipped entirely.
How it works
- Scans for files matching
src_extensionsinsrc_dirs - Creates one product per input file
- Runs the tool on each file (or batch of files)
- If the tool exits successfully, records a marker in the cache
- On the next build, if inputs are unchanged, the check is skipped
What gets cached
A marker entry — no files, no blobs. The marker’s presence means “this check passed with these inputs.”
Examples
Lint Python files with ruff:
[processor.ruff]
Scans for .py files, runs ruff check on each. No output files produced.
src/main.py → (checker)
src/utils.py → (checker)
Lint shell scripts:
[processor.shellcheck]
Scans for .sh and .bash files, runs shellcheck on each.
Validate YAML files:
[processor.yamllint]
Scans for .yml and .yaml files, runs yamllint on each.
Validate JSON files with jq:
[processor.jq]
Scans for .json files, validates each with jq.
Spell check Markdown files:
[processor.zspell]
Scans for .md files, checks spelling with the built-in zspell engine.
Built-in checkers
ruff, pylint, mypy, pyrefly, black, pytest, doctest, shellcheck, luacheck, yamllint, jq, jsonlint, taplo, cppcheck, clang_tidy, cpplint, checkpatch, mdl, markdownlint, rumdl, aspell, zspell, ascii, encoding, duplicate_files, terms, eslint, jshint, standard, htmlhint, htmllint, tidy, stylelint, jslint, svglint, svgo, perlcritic, xmllint, checkstyle, php_lint, yq, hadolint, slidev, json_schema, iyamlschema, ijq, ijsonlint, iyamllint, itaplo, marp_images, license_header
Generator
A generator transforms each input file into one or more output files. It creates one product per input file (or one per input x format pair for multi-format generators like pandoc).
How it works
- Scans for files matching
src_extensionsinsrc_dirs - For each input file, computes the output path from the input path, output directory, and format
- Creates one product per input x format pair
- Runs the tool to produce the output file
- Stores the output as a content-addressed blob in the cache
What gets cached
One blob per output file. The blob is the raw file content, stored by its SHA-256 hash. On restore, the blob is hardlinked (or copied) to the output path.
Examples
Render Tera templates:
[processor.tera]
Scans tera.templates/ for .tera files, renders each template. The output path is the template path with the .tera extension stripped:
tera.templates/config.py.tera → config.py
tera.templates/README.md.tera → README.md
Convert Marp slides to PDF:
[processor.marp]
Scans marp/ for .md files, converts each to PDF (and optionally other formats):
marp/slides.md → out/marp/slides.pdf
marp/intro.md → out/marp/intro.pdf
Convert documents with pandoc (multi-format):
[processor.pandoc]
Scans pandoc/ for .md files, converts each to PDF, HTML, and DOCX. Each format is a separate product with its own cache entry:
pandoc/syllabus.md → out/pandoc/syllabus.pdf
pandoc/syllabus.md → out/pandoc/syllabus.html
pandoc/syllabus.md → out/pandoc/syllabus.docx
Compile single-file C programs:
[processor.cc_single_file]
Scans src/ for .c and .cc files, compiles each into an executable:
src/main.c → out/cc_single_file/src/main.elf
src/test.c → out/cc_single_file/src/test.elf
Convert Mermaid diagrams:
[processor.mermaid]
Scans for .mmd files, converts each to PNG (configurable formats):
diagrams/flow.mmd → out/mermaid/diagrams/flow.png
Compile SCSS to CSS:
[processor.sass]
Scans sass/ for .scss and .sass files, compiles each to CSS:
sass/styles.scss → out/sass/styles.css
Built-in generators
tera, mako, jinja2, cc_single_file, pandoc, marp, mermaid, drawio, chromium, libreoffice, protobuf, sass, markdown2html, pdflatex, a2x, objdump, rust_single_file, tags, pdfunite, ipdfunite, imarkdown2html, isass, yaml2json, generator, script
Creator
A creator runs a command and caches declared output files and directories. It scans for anchor files — files whose presence means “run this tool here.” One product is created per anchor file found, and the command runs in the anchor file’s directory.
Unlike generators (where outputs are derived from input paths), creator outputs are declared explicitly in the config via output_dirs and output_files.
How it works
- Scans for anchor files matching
src_extensionsinsrc_dirs - Creates one product per anchor file
- Runs the command in the anchor file’s directory
- Walks all declared
output_dirsand collectsoutput_files - Stores each file as a content-addressed blob
- Records a tree in the cache — a manifest listing every output file with its path, blob checksum, and Unix permissions
What gets cached
A tree entry listing all output files. On restore, the directory tree is recreated from cached blobs with permissions preserved. Individual files within the tree that already exist with the correct checksum are skipped.
Examples
Install Python dependencies with pip:
[processor.creator.venv]
command = "pip"
args = ["install", "-r", "requirements.txt"]
src_extensions = ["requirements.txt"]
output_dirs = [".venv"]
Scans for requirements.txt files. For each one, runs pip install and caches the entire .venv/ directory. After rsconstruct clean, the venv is restored from cache instead of reinstalling.
Build a Node.js project:
[processor.creator.npm_build]
command = "npm"
args = ["run", "build"]
src_extensions = ["package.json"]
output_dirs = ["dist"]
Scans for package.json files, runs npm run build, caches the dist/ directory.
Build documentation with Sphinx:
[processor.sphinx]
Scans for conf.py files, runs sphinx-build, caches the output directory.
docs/conf.py → (creator)
Build a Rust project with Cargo:
[processor.cargo]
Scans for Cargo.toml files, runs cargo build, optionally caches the target/ directory.
Cargo.toml → (creator)
Run a custom build script:
[processor.creator.assets]
command = "./build_assets.sh"
src_extensions = [".manifest"]
src_dirs = ["."]
output_dirs = ["assets/compiled", "assets/sprites"]
output_files = ["assets/manifest.json"]
Scans for .manifest files, runs the build script, caches two output directories and one output file.
Built-in creators
cargo, pip, npm, gem, sphinx, mdbook, jekyll, cc (full C/C++ projects)
User-defined creators use the creator processor type directly via [processor.creator.NAME].
Explicit
An explicit processor aggregates many inputs into (possibly) many output files and/or directories. Unlike other types which create one product per discovered file, explicit creates a single product with all declared inputs and outputs.
How it works
- Inputs are listed explicitly via
inputsandinput_globsin the config - Creates a single product with all inputs and all outputs
- Runs the command, passing
--inputsand--outputson the command line - Stores each output file as a content-addressed blob
What gets cached
One blob per output file (like generator).
Examples
Build a static site from generated HTML:
[processor.explicit.site]
command = "python3"
args = ["build_site.py"]
input_globs = ["out/pandoc/*.html", "templates/*.html"]
inputs = ["site.yaml"]
outputs = ["out/site/index.html", "out/site/style.css"]
Waits for pandoc to produce HTML files, then combines them with templates into a site. All inputs are aggregated into one product:
out/pandoc/page1.html, out/pandoc/page2.html, templates/base.html, site.yaml → out/site/index.html, out/site/style.css
Merge PDFs into a course bundle:
[processor.explicit.course]
command = "pdfunite"
input_globs = ["out/pdflatex/*.pdf"]
outputs = ["out/course/full-course.pdf"]
Aggregates all PDF outputs from pdflatex into a single merged PDF.
Built-in explicit processors
explicit, pdfunite, ipdfunite
Comparison
| Checker | Generator | Creator | Explicit | |
|---|---|---|---|---|
| Purpose | Validate | Transform | Build/install | Aggregate |
| Inputs | Scanned | Scanned | Scanned (anchor files) | Declared in config |
| Products | One per input | One per input (x format) | One per anchor | One total |
| Outputs | None | Derived from input path | Declared dirs + files | Declared files |
| Cache type | Marker | Blob | Tree | Blob |
| Runs in | Project root | Project root | Anchor file’s directory | Project root |
| Command args | Input files | Input + output | User-defined args | --inputs + --outputs |
A2x Processor
Purpose
Converts AsciiDoc files to PDF (or other formats) using a2x.
How It Works
Discovers .txt (AsciiDoc) files in the project and runs a2x on each file,
producing output in the configured format.
Source Files
- Input:
**/*.txt - Output:
out/a2x/{relative_path}.pdf
Configuration
[processor.a2x]
a2x = "a2x" # The a2x command to run
format = "pdf" # Output format (pdf, xhtml, dvi, ps, epub, mobi)
args = [] # Additional arguments to pass to a2x
output_dir = "out/a2x" # Output directory
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
a2x | string | "a2x" | The a2x executable to run |
format | string | "pdf" | Output format |
args | string[] | [] | Extra arguments passed to a2x |
output_dir | string | "out/a2x" | Output directory |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
Ascii Check Processor
Purpose
Validates that files contain only ASCII characters.
How It Works
Discovers .md files in the project and checks each for non-ASCII characters.
Files containing non-ASCII bytes fail the check. This is a built-in processor
that does not require any external tools.
This processor supports batch mode, allowing multiple files to be checked in a single invocation.
Source Files
- Input:
**/*.md - Output: none (checker)
Configuration
[processor.ascii]
args = [] # Additional arguments (unused, for consistency)
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments (reserved for future use) |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Aspell Processor
Purpose
Checks spelling in Markdown files using aspell.
How It Works
Discovers .md files in the project and runs aspell on each file using the
configured aspell configuration file. A non-zero exit code fails the product.
Source Files
- Input:
**/*.md - Output: none (checker)
Configuration
[processor.aspell]
command = "aspell" # The aspell command to run
conf = ".aspell.conf" # Aspell configuration file
args = [] # Additional arguments to pass to aspell
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "aspell" | The aspell executable to run |
conf | string | ".aspell.conf" | Aspell configuration file |
args | string[] | [] | Extra arguments passed to aspell |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Black Processor
Purpose
Checks Python file formatting using Black, the uncompromising code formatter. Runs black --check which verifies files are already formatted without modifying them.
How It Works
Python files matching configured extensions are checked via black --check. The command exits with a non-zero status if any file would be reformatted, causing the build to fail.
Source Files
- Input:
**/*{src_extensions}(default:*.py)
Configuration
[processor.black]
src_extensions = [".py"] # File extensions to check (default: [".py"])
dep_inputs = [] # Additional files that trigger rechecks when changed
args = [] # Extra arguments passed to black
| Key | Type | Default | Description |
|---|---|---|---|
src_extensions | string[] | [".py"] | File extensions to discover |
dep_inputs | string[] | [] | Extra files whose changes trigger rechecks |
dep_auto | string[] | ["pyproject.toml"] | Config files that auto-trigger rechecks |
args | string[] | [] | Additional arguments passed to black |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Cargo Processor
Purpose
Builds Rust projects using Cargo. Each Cargo.toml produces a cached success
marker, allowing RSConstruct to skip rebuilds when source files haven’t changed.
How It Works
Discovers files named Cargo.toml in the project. For each Cargo.toml found,
the processor runs cargo build (or a configured command) in that directory.
Input Tracking
The cargo processor tracks all .rs and .toml files in the Cargo.toml’s
directory tree as inputs. This includes:
Cargo.tomlandCargo.lock- All Rust source files (
src/**/*.rs) - Test files, examples, benches
- Workspace member Cargo.toml files
When any tracked file changes, rsconstruct will re-run cargo.
Workspaces
For Cargo workspaces, each Cargo.toml (root and members) is discovered as a
separate product. To build only the workspace root, use src_exclude_paths to skip
member directories, or configure src_dirs to limit discovery.
Source Files
- Input:
Cargo.tomlplus all.rsand.tomlfiles in the project tree - Output: None (creator — produces output in
targetdirectory)
Configuration
[processor.cargo]
cargo = "cargo" # Cargo binary to use
command = "build" # Cargo command (build, check, test, clippy, etc.)
args = [] # Extra arguments passed to cargo
profiles = ["dev", "release"] # Cargo profiles to build
src_dirs = [""] # Directory to scan ("" = project root)
src_extensions = ["Cargo.toml"]
dep_inputs = [] # Additional files that trigger rebuilds
cache_output_dir = true # Cache the target/ directory for fast restore after clean
| Key | Type | Default | Description |
|---|---|---|---|
cargo | string | "cargo" | Path or name of the cargo binary |
command | string | "build" | Cargo subcommand to run |
args | string[] | [] | Extra arguments passed to cargo |
profiles | string[] | ["dev", "release"] | Cargo profiles to build (creates one product per profile) |
src_dirs | string[] | [""] | Directory to scan for Cargo.toml files |
src_extensions | string[] | ["Cargo.toml"] | File names to match |
src_exclude_dirs | string[] | ["/.git/", "/target/", ...] | Directory patterns to exclude |
src_exclude_paths | string[] | [] | Paths (relative to project root) to exclude |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
cache_output_dir | boolean | true | Cache the target/ directory so rsconstruct clean && rsconstruct build restores from cache. Consider disabling for large projects. |
Batch support
Runs as a single whole-project operation (e.g., cargo build, npm install).
Examples
Basic Usage
[processor.cargo]
Release Only
[processor.cargo]
profiles = ["release"]
Dev Only
[processor.cargo]
profiles = ["dev"]
Use cargo check Instead of build
[processor.cargo]
command = "check"
Run clippy
[processor.cargo]
command = "clippy"
args = ["--", "-D", "warnings"]
Workspace Root Only
[processor.cargo]
src_exclude_paths = ["crates/"]
Notes
- Cargo has its own incremental compilation, so rsconstruct’s caching mainly avoids invoking cargo at all when nothing changed
- The
target/directory is automatically excluded from input scanning - For monorepos with multiple Rust projects, each Cargo.toml is built separately
CC Project Processor
Purpose
Builds full C/C++ projects with multiple targets (libraries and executables)
defined in a cc.yaml manifest file. Unlike the CC Single File
processor which compiles each source file into a standalone executable, this
processor supports multi-file targets with dependency linking.
How It Works
The processor scans for cc.yaml files. Each manifest defines libraries
and programs to build. All paths in the manifest (sources, include directories)
are relative to the cc.yaml file’s location and are automatically resolved
to project-root-relative paths before compilation. All commands run from the
project root.
Output goes under out/cc/<path-to-cc.yaml-dir>/, so a manifest at
src/exercises/foo/cc.yaml produces output in out/cc/src/exercises/foo/.
A manifest at the project root produces output in out/cc/.
Source files are compiled to object files, then linked into the final targets:
src/exercises/foo/cc.yaml defines:
library "mymath" (static) from math.c, utils.c
program "main" from main.c, links mymath
Build produces:
out/cc/src/exercises/foo/obj/mymath/math.o
out/cc/src/exercises/foo/obj/mymath/utils.o
out/cc/src/exercises/foo/lib/libmymath.a
out/cc/src/exercises/foo/obj/main/main.o
out/cc/src/exercises/foo/bin/main
cc.yaml Format
All paths in the manifest are relative to the cc.yaml file’s location.
# Global settings (all optional)
cc: gcc # C compiler (default: gcc)
cxx: g++ # C++ compiler (default: g++)
cflags: [-Wall] # Global C flags
cxxflags: [-Wall] # Global C++ flags
ldflags: [] # Global linker flags
include_dirs: [include] # Global -I paths (relative to cc.yaml location)
# Library definitions
libraries:
- name: mymath
lib_type: shared # shared (.so) | static (.a) | both
sources: [src/math.c, src/utils.c]
include_dirs: [include] # Additional -I for this library
cflags: [] # Additional C flags
cxxflags: [] # Additional C++ flags
ldflags: [-lm] # Linker flags for shared lib
- name: myhelper
lib_type: static
sources: [src/helper.c]
# Program definitions
programs:
- name: main
sources: [src/main.c]
link: [mymath, myhelper] # Libraries defined above to link against
ldflags: [-lpthread] # Additional linker flags
- name: tool
sources: [src/tool.cc] # .cc -> uses C++ compiler
link: [mymath]
Library Types
| Type | Output | Description |
|---|---|---|
shared | lib/lib<name>.so | Shared library (default). Sources compiled with -fPIC. |
static | lib/lib<name>.a | Static library via ar rcs. |
both | Both .so and .a | Builds both shared and static variants. |
Language Detection
The compiler is chosen per source file based on extension:
| Extensions | Compiler |
|---|---|
.c | C compiler (cc field) |
.cc, .cpp, .cxx, .C | C++ compiler (cxx field) |
Global cflags are used for C files and cxxflags for C++ files.
Output Layout
Output is placed under out/cc/<cc.yaml-relative-dir>/:
out/cc/<cc.yaml-dir>/
obj/<target_name>/ # Object files per target
file.o
lib/ # Libraries
lib<name>.a
lib<name>.so
bin/ # Executables
<program_name>
Build Modes
Compile + Link (default)
Each source is compiled to a .o file, then targets are linked from objects.
This provides incremental rebuilds — only changed sources are recompiled.
Single Invocation
When single_invocation = true in rsconstruct.toml, programs are built by passing
all sources directly to the compiler in one command. Libraries still use
compile+link since ar requires object files.
Configuration
[processor.cc]
enabled = true # Enable/disable (default: true)
cc = "gcc" # Default C compiler (default: "gcc")
cxx = "g++" # Default C++ compiler (default: "g++")
cflags = [] # Additional global C flags
cxxflags = [] # Additional global C++ flags
ldflags = [] # Additional global linker flags
include_dirs = [] # Additional global -I paths
single_invocation = false # Use single-invocation mode (default: false)
dep_inputs = [] # Extra files that trigger rebuilds
cache_output_dir = true # Cache entire output directory (default: true)
Note: The cc.yaml manifest settings override the rsconstruct.toml defaults for
compiler and flags.
Configuration Reference
| Key | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable/disable the processor |
cc | string | "gcc" | Default C compiler |
cxx | string | "g++" | Default C++ compiler |
cflags | string[] | [] | Global C compiler flags |
cxxflags | string[] | [] | Global C++ compiler flags |
ldflags | string[] | [] | Global linker flags |
include_dirs | string[] | [] | Global include directories |
single_invocation | bool | false | Build programs in single compiler invocation |
dep_inputs | string[] | [] | Extra files that trigger rebuilds when changed |
cache_output_dir | bool | true | Cache the entire output directory |
src_dirs | string[] | [""] | Directory to scan for cc.yaml files |
src_extensions | string[] | ["cc.yaml"] | File patterns to scan for |
Batch support
Runs as a single whole-project operation (e.g., cargo build, npm install).
Example
Given this project layout:
myproject/
rsconstruct.toml
exercises/
math/
cc.yaml
include/
math.h
math.c
main.c
With exercises/math/cc.yaml:
include_dirs: [include]
libraries:
- name: math
lib_type: static
sources: [math.c]
programs:
- name: main
sources: [main.c]
link: [math]
Running rsconstruct build produces:
out/cc/exercises/math/obj/math/math.o
out/cc/exercises/math/lib/libmath.a
out/cc/exercises/math/obj/main/main.o
out/cc/exercises/math/bin/main
CC Single File Processor
Purpose
Compiles C (.c) and C++ (.cc) source files into executables, one source file per executable.
How It Works
Source files under the configured source directory are compiled into executables
under out/cc_single_file/, mirroring the directory structure:
src/main.c → out/cc_single_file/main.elf
src/a/b.c → out/cc_single_file/a/b.elf
src/app.cc → out/cc_single_file/app.elf
Header dependencies are automatically tracked via compiler-generated .d files
(-MMD -MF). When a header changes, all source files that include it are rebuilt.
Source Files
- Input:
{source_dir}/**/*.c,{source_dir}/**/*.cc - Output:
out/cc_single_file/{relative_path}{output_suffix}
Per-File Flags
Per-file compile and link flags can be set via special comments in source files. This allows individual files to require specific libraries or compiler options without affecting the entire project.
Flag directives
// EXTRA_COMPILE_FLAGS_BEFORE=-pthread
// EXTRA_COMPILE_FLAGS_AFTER=-O2 -DNDEBUG
// EXTRA_LINK_FLAGS_BEFORE=-L/usr/local/lib
// EXTRA_LINK_FLAGS_AFTER=-lX11
Command directives
Execute a command and use its stdout as flags (no shell):
// EXTRA_COMPILE_CMD=pkg-config --cflags gtk+-3.0
// EXTRA_LINK_CMD=pkg-config --libs gtk+-3.0
Shell directives
Execute via sh -c (full shell syntax):
// EXTRA_COMPILE_SHELL=echo -DLEVEL2_CACHE_LINESIZE=$(getconf LEVEL2_CACHE_LINESIZE)
// EXTRA_LINK_SHELL=echo -L$(brew --prefix openssl)/lib
Backtick substitution
Flag directives also support backtick substitution for inline command execution:
// EXTRA_COMPILE_FLAGS_AFTER=`pkg-config --cflags gtk+-3.0`
// EXTRA_LINK_FLAGS_AFTER=`pkg-config --libs gtk+-3.0`
Command caching
All command and shell directives (EXTRA_*_CMD, EXTRA_*_SHELL, and backtick substitutions) are cached in memory during a build. If multiple source files use the same command (e.g., pkg-config --cflags gtk+-3.0), it is executed only once. This improves build performance when many files share common dependencies.
Compiler profile-specific flags
When using multiple compiler profiles, you can specify flags that only apply to a specific compiler by adding [profile_name] after the directive name:
// EXTRA_COMPILE_FLAGS_BEFORE=-g
// EXTRA_COMPILE_FLAGS_BEFORE[gcc]=-femit-struct-debug-baseonly
// EXTRA_COMPILE_FLAGS_BEFORE[clang]=-gline-tables-only
In this example:
-gis applied to all compilers-femit-struct-debug-baseonlyis only applied when compiling with the “gcc” profile-gline-tables-onlyis only applied when compiling with the “clang” profile
The profile name matches the name field in your [[processor.cc_single_file.compilers]] configuration:
[[processor.cc_single_file.compilers]]
name = "gcc" # Matches [gcc] suffix
cc = "gcc"
[[processor.cc_single_file.compilers]]
name = "clang" # Matches [clang] suffix
cc = "clang"
This works with all directive types:
EXTRA_COMPILE_FLAGS_BEFORE[profile]EXTRA_COMPILE_FLAGS_AFTER[profile]EXTRA_LINK_FLAGS_BEFORE[profile]EXTRA_LINK_FLAGS_AFTER[profile]EXTRA_COMPILE_CMD[profile]EXTRA_LINK_CMD[profile]EXTRA_COMPILE_SHELL[profile]EXTRA_LINK_SHELL[profile]
Excluding files from specific profiles
To exclude a source file from being compiled with specific compiler profiles, use EXCLUDE_PROFILE:
// EXCLUDE_PROFILE=clang
This is useful when a file uses compiler-specific features that aren’t available in other compilers. For example, a file using GCC-only builtins like __builtin_va_arg_pack_len():
// EXCLUDE_PROFILE=clang
// This file uses GCC-specific builtins
#include <stdarg.h>
void example(int first, ...) {
int count = __builtin_va_arg_pack_len(); // GCC-only
// ...
}
You can exclude multiple profiles by listing them space-separated:
// EXCLUDE_PROFILE=clang icc
Directive summary
| Directive | Execution | Use case |
|---|---|---|
EXTRA_COMPILE_FLAGS_BEFORE | Literal flags | Flags before default cflags |
EXTRA_COMPILE_FLAGS_AFTER | Literal flags | Flags after default cflags |
EXTRA_LINK_FLAGS_BEFORE | Literal flags | Flags before default ldflags |
EXTRA_LINK_FLAGS_AFTER | Literal flags | Flags after default ldflags |
EXTRA_COMPILE_CMD | Subprocess (no shell) | Dynamic compile flags via command |
EXTRA_LINK_CMD | Subprocess (no shell) | Dynamic link flags via command |
EXTRA_COMPILE_SHELL | sh -c (full shell) | Dynamic compile flags needing shell features |
EXTRA_LINK_SHELL | sh -c (full shell) | Dynamic link flags needing shell features |
Supported comment styles
Directives can appear in any of these comment styles:
C++ style:
// EXTRA_LINK_FLAGS_AFTER=-lX11
C block comment (single line):
/* EXTRA_LINK_FLAGS_AFTER=-lX11 */
C block comment (multi-line, star-prefixed):
/*
* EXTRA_LINK_FLAGS_AFTER=-lX11
*/
Command Line Ordering
The compiler command is constructed in this order:
compiler -MMD -MF deps -I... [compile_before] [cflags/cxxflags] [compile_after] -o output source [link_before] [ldflags] [link_after]
Link flags come after the source file so the linker can resolve symbols correctly.
| Position | Source |
|---|---|
compile_before | EXTRA_COMPILE_FLAGS_BEFORE + EXTRA_COMPILE_CMD + EXTRA_COMPILE_SHELL |
cflags/cxxflags | [processor.cc_single_file] config cflags or cxxflags |
compile_after | EXTRA_COMPILE_FLAGS_AFTER |
link_before | EXTRA_LINK_FLAGS_BEFORE + EXTRA_LINK_CMD + EXTRA_LINK_SHELL |
ldflags | [processor.cc_single_file] config ldflags |
link_after | EXTRA_LINK_FLAGS_AFTER |
Verbosity Levels (--processor-verbose N)
| Level | Output |
|---|---|
| 0 (default) | Target basename: main.elf |
| 1 | Target path + compiler commands: out/cc_single_file/main.elf |
| 2 | Adds source path: out/cc_single_file/main.elf <- src/main.c |
| 3 | Adds all inputs: out/cc_single_file/main.elf <- src/main.c, src/utils.h |
Configuration
Single Compiler (Legacy)
[processor.cc_single_file]
cc = "gcc" # C compiler (default: "gcc")
cxx = "g++" # C++ compiler (default: "g++")
cflags = [] # C compiler flags
cxxflags = [] # C++ compiler flags
ldflags = [] # Linker flags
include_paths = [] # Additional -I paths (relative to project root)
src_dirs = ["src"] # Source directory (default: "src")
output_suffix = ".elf" # Suffix for output executables (default: ".elf")
dep_inputs = [] # Additional files that trigger rebuilds when changed
include_scanner = "native" # Method for scanning header dependencies (default: "native")
Multiple Compilers
To compile with multiple compilers (e.g., both GCC and Clang), use the compilers array:
[processor.cc_single_file]
src_dirs = ["src"]
include_paths = ["include"] # Shared across all compilers
[[processor.cc_single_file.compilers]]
name = "gcc"
cc = "gcc"
cxx = "g++"
cflags = ["-Wall", "-Wextra"]
cxxflags = ["-Wall", "-Wextra"]
ldflags = []
output_suffix = ".elf"
[[processor.cc_single_file.compilers]]
name = "clang"
cc = "clang"
cxx = "clang++"
cflags = ["-Wall", "-Wextra", "-Weverything"]
cxxflags = ["-Wall", "-Wextra"]
ldflags = []
output_suffix = ".elf"
When using multiple compilers, outputs are organized by compiler name:
src/main.c → out/cc_single_file/gcc/main.elf
→ out/cc_single_file/clang/main.elf
Each source file is compiled once per compiler profile, allowing you to:
- Test code with multiple compilers to catch different warnings
- Compare output between compilers
- Build for different targets (cross-compilation)
Configuration Reference
| Key | Type | Default | Description |
|---|---|---|---|
cc | string | "gcc" | C compiler command |
cxx | string | "g++" | C++ compiler command |
cflags | string[] | [] | Flags passed to the C compiler |
cxxflags | string[] | [] | Flags passed to the C++ compiler |
ldflags | string[] | [] | Flags passed to the linker |
include_paths | string[] | [] | Additional -I include paths (shared) |
src_dirs | string[] | ["src"] | Directory to scan for source files |
output_suffix | string | ".elf" | Suffix appended to output executables |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
include_scanner | string | "native" | Method for scanning header dependencies |
compilers | array | [] | Multiple compiler profiles (overrides single-compiler fields) |
Compiler Profile Fields
Each entry in the compilers array can have:
| Key | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Profile name (used in output path) |
cc | string | No | C compiler (default: “gcc”) |
cxx | string | No | C++ compiler (default: “g++”) |
cflags | string[] | No | C compiler flags |
cxxflags | string[] | No | C++ compiler flags |
ldflags | string[] | No | Linker flags |
output_suffix | string | No | Output suffix (default: “.elf”) |
Batch support
Each input file is processed individually, producing its own output file.
Include Scanner
The include_scanner option controls how header dependencies are discovered:
| Value | Description |
|---|---|
native | Fast regex-based scanner (default). Parses #include directives directly without spawning external processes. Handles #include "file" and #include <file> forms. |
compiler | Uses gcc -MM / g++ -MM to scan dependencies. More accurate for complex cases (computed includes, conditional compilation) but slower as it spawns a compiler process per source file. |
Native scanner behavior
The native scanner:
- Recursively follows
#includedirectives - Searches include paths in order: source file directory, configured
include_paths, project root - Skips system headers (
/usr/...,/lib/...) - Only tracks project-local headers (relative paths)
When to use compiler scanner
Use include_scanner = "compiler" if you have:
- Computed includes:
#include MACRO_THAT_EXPANDS_TO_FILENAME - Complex conditional compilation affecting which headers are included
- Headers outside the standard search paths that the native scanner misses
The native scanner may occasionally report extra dependencies (false positives), which is safe—it just means some files might rebuild unnecessarily. It will not miss dependencies (false negatives) for standard #include patterns.
Checkpatch Processor
Purpose
Checks C source files using the Linux kernel’s checkpatch.pl script.
How It Works
Discovers .c and .h files under src/ (excluding common C/C++ build
directories), runs checkpatch.pl on each file, and records success in the
cache. A non-zero exit code from checkpatch fails the product.
This processor supports batch mode.
Source Files
- Input:
src/**/*.c,src/**/*.h - Output: none (checker)
Configuration
[processor.checkpatch]
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to checkpatch.pl |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool processes one file at a time. Each file is checked in a separate invocation.
Checkstyle Processor
Purpose
Checks Java code style using Checkstyle.
How It Works
Discovers .java files in the project (excluding common build tool directories),
runs checkstyle on each file, and records success in the cache. A non-zero exit
code from checkstyle fails the product.
This processor supports batch mode.
If a checkstyle.xml file exists, it is automatically added as an extra input so
that configuration changes trigger rebuilds.
Source Files
- Input:
**/*.java - Output: none (checker)
Configuration
[processor.checkstyle]
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to checkstyle |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Chromium Processor
Purpose
Converts HTML files to PDF using headless Chromium (Google Chrome).
How It Works
Discovers .html files in the configured scan directory (default: out/marp) and runs
headless Chromium with --print-to-pdf on each file, producing a PDF output.
This is typically used as a post-processing step after another processor (e.g., Marp) generates HTML files.
Source Files
- Input:
out/marp/**/*.html(default scan directory) - Output:
out/chromium/{relative_path}.pdf
Configuration
[processor.chromium]
chromium_bin = "google-chrome" # The Chromium/Chrome executable to run
args = [] # Additional arguments to pass to Chromium
output_dir = "out/chromium" # Output directory for PDFs
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
chromium_bin | string | "google-chrome" | The Chromium or Google Chrome executable |
args | string[] | [] | Extra arguments passed to Chromium |
output_dir | string | "out/chromium" | Base output directory for PDF files |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
Clang-Tidy Processor
Purpose
Runs clang-tidy static analysis on C/C++ source files.
How It Works
Discovers .c and .cc files under the configured source directory, runs
clang-tidy on each file individually, and creates a stub file on success. A
non-zero exit code from clang-tidy fails the product.
Note: This processor does not support batch mode. Each file is checked separately to avoid cross-file analysis issues with unrelated files.
Source Files
- Input:
{source_dir}/**/*.c,{source_dir}/**/*.cc - Output:
out/clang_tidy/{flat_name}.clang_tidy
Configuration
[processor.clang_tidy]
args = ["-checks=*"] # Arguments passed to clang-tidy
compiler_args = ["-std=c++17"] # Arguments passed after -- to the compiler
dep_inputs = [".clang-tidy"] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Arguments passed to clang-tidy |
compiler_args | string[] | [] | Compiler arguments passed after -- separator |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool processes one file at a time. Each file is checked in a separate invocation.
Compiler Arguments
Clang-tidy requires knowing compiler flags to properly parse the source files.
Use compiler_args to specify include paths, defines, and language standards:
[processor.clang_tidy]
compiler_args = ["-std=c++17", "-I/usr/include/mylib", "-DDEBUG"]
Using .clang-tidy File
Clang-tidy automatically reads configuration from a .clang-tidy file in the
project root. Add it to dep_inputs so changes trigger rebuilds:
[processor.clang_tidy]
dep_inputs = [".clang-tidy"]
Clippy Processor
Purpose
Lints Rust projects using Cargo Clippy. Each Cargo.toml
produces a cached success marker, allowing RSConstruct to skip re-linting when source files haven’t changed.
How It Works
Discovers files named Cargo.toml in the project. For each Cargo.toml found,
the processor runs cargo clippy in that directory. A non-zero exit code fails the product.
Input Tracking
The clippy processor tracks all .rs and .toml files in the Cargo.toml’s
directory tree as inputs. This includes:
Cargo.tomlandCargo.lock- All Rust source files (
src/**/*.rs) - Test files, examples, benches
- Workspace member Cargo.toml files
When any tracked file changes, rsconstruct will re-run clippy.
Source Files
- Input:
Cargo.tomlplus all.rsand.tomlfiles in the project tree - Output: None (checker-style caching)
Configuration
[processor.clippy]
cargo = "cargo" # Cargo binary to use
command = "clippy" # Cargo command (usually "clippy")
args = [] # Extra arguments passed to cargo clippy
src_dirs = [""] # Directory to scan ("" = project root)
src_extensions = ["Cargo.toml"]
dep_inputs = [] # Additional files that trigger rebuilds
| Key | Type | Default | Description |
|---|---|---|---|
cargo | string | "cargo" | Path or name of the cargo binary |
command | string | "clippy" | Cargo subcommand to run |
args | string[] | [] | Extra arguments passed to cargo clippy |
src_dirs | string[] | [""] | Directory to scan for Cargo.toml files |
src_extensions | string[] | ["Cargo.toml"] | File names to match |
src_exclude_dirs | string[] | ["/.git/", "/target/", ...] | Directory patterns to exclude |
src_exclude_paths | string[] | [] | Paths (relative to project root) to exclude |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool processes one file at a time. Each file is checked in a separate invocation.
Examples
Basic Usage
[processor.clippy]
Deny All Warnings
[processor.clippy]
args = ["--", "-D", "warnings"]
Use Both Cargo Build and Clippy
[processor.cargo]
[processor.clippy]
Notes
- Clippy uses the
cargobinary which is shared with the cargo processor - The
target/directory is automatically excluded from input scanning - For monorepos with multiple Rust projects, each Cargo.toml is linted separately
CMake Processor
Purpose
Lints CMake files using cmake --lint.
How It Works
Discovers CMakeLists.txt files in the project (excluding common build tool
directories), runs cmake --lint on each file, and records success in the cache.
A non-zero exit code from cmake fails the product.
This processor supports batch mode.
Source Files
- Input:
**/CMakeLists.txt - Output: none (checker)
Configuration
[processor.cmake]
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to cmake |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Cppcheck Processor
Purpose
Runs cppcheck static analysis on C/C++ source files.
How It Works
Discovers .c and .cc files under the configured source directory, runs
cppcheck on each file individually, and creates a stub file on success. A
non-zero exit code from cppcheck fails the product.
Note: This processor does not support batch mode. Each file is checked
separately because cppcheck performs cross-file analysis (CTU - Cross Translation
Unit) which produces false positives when unrelated files are checked together.
For example, standalone example programs that define classes with the same name
will trigger ctuOneDefinitionRuleViolation errors even though the files are
never linked together. Cppcheck has no flag to disable this cross-file analysis
(--max-ctu-depth=0 does not help), so files must be checked individually.
Source Files
- Input:
{source_dir}/**/*.c,{source_dir}/**/*.cc - Output:
out/cppcheck/{flat_name}.cppcheck
Configuration
[processor.cppcheck]
args = ["--error-exitcode=1", "--enable=warning,style,performance,portability"]
dep_inputs = [".cppcheck-suppressions"] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | ["--error-exitcode=1", "--enable=warning,style,performance,portability"] | Arguments passed to cppcheck |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
To use a suppressions file, add "--suppressions-list=.cppcheck-suppressions" to args.
Batch support
The tool processes one file at a time. Each file is checked in a separate invocation.
Cpplint Processor
Purpose
Lints C/C++ files using cpplint (Google C++ style checker).
How It Works
Discovers .c, .cc, .h, and .hh files under src/ (excluding common
C/C++ build directories), runs cpplint on each file, and records success in
the cache. A non-zero exit code from cpplint fails the product.
This processor supports batch mode.
Source Files
- Input:
src/**/*.c,src/**/*.cc,src/**/*.h,src/**/*.hh - Output: none (checker)
Configuration
[processor.cpplint]
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to cpplint |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool processes one file at a time. Each file is checked in a separate invocation.
Doctest Processor
Purpose
Runs Python doctests embedded in .py files using python3 -m doctest.
How It Works
Python files (.py) are checked for embedded doctests. Each file is run through
python3 -m doctest — failing doctests cause the build to fail.
Source Files
- Input:
**/*.py - Output: none (checker — pass/fail only)
Configuration
[processor.doctest]
src_extensions = [".py"] # File extensions to process (default: [".py"])
dep_inputs = [] # Additional files that trigger rebuilds
| Key | Type | Default | Description |
|---|---|---|---|
src_extensions | string[] | [".py"] | File extensions to discover |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Drawio Processor
Purpose
Converts Draw.io diagram files to PNG, SVG, or PDF.
How It Works
Discovers .drawio files in the project and runs drawio in export mode on
each file, generating output in the configured formats.
Source Files
- Input:
**/*.drawio - Output:
out/drawio/{format}/{relative_path}.{format}
Configuration
[processor.drawio]
drawio_bin = "drawio" # The drawio command to run
formats = ["png"] # Output formats (png, svg, pdf)
args = [] # Additional arguments to pass to drawio
output_dir = "out/drawio" # Output directory
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
drawio_bin | string | "drawio" | The drawio executable to run |
formats | string[] | ["png"] | Output formats to generate (png, svg, pdf) |
args | string[] | [] | Extra arguments passed to drawio |
output_dir | string | "out/drawio" | Base output directory |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
ESLint Processor
Purpose
Lints JavaScript and TypeScript files using ESLint.
How It Works
Discovers .js, .jsx, .ts, .tsx, .mjs, and .cjs files in the project
(excluding common build tool directories), runs eslint on each file, and records
success in the cache. A non-zero exit code from eslint fails the product.
This processor supports batch mode, allowing multiple files to be checked in a single eslint invocation for better performance.
If an ESLint config file exists (.eslintrc* or eslint.config.*), it is
automatically added as an extra input so that configuration changes trigger rebuilds.
Source Files
- Input:
**/*.js,**/*.jsx,**/*.ts,**/*.tsx,**/*.mjs,**/*.cjs - Output: none (checker)
Configuration
[processor.eslint]
command = "eslint"
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "eslint" | The eslint executable to run |
args | string[] | [] | Extra arguments passed to eslint |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Explicit Processor
Why “explicit”?
Other processor types discover their inputs by scanning directories for files matching certain extensions. The explicit processor is different: the user declares exactly which files are inputs and which are outputs. Nothing is discovered or inferred.
Names considered:
- explicit — chosen. Directly communicates the key difference: everything is declared rather than discovered.
- custom — too generic. Doesn’t say what makes it different from the
existing
generatorprocessor (which is also “custom”). - rule — precise (Bazel/Make terminology for a build rule with explicit inputs/outputs), but carries baggage from other build systems and doesn’t fit the rsconstruct naming convention (processors, not rules).
- aggregate — describes the many-inputs-to-few-outputs pattern, but not all uses are aggregations.
- task — too generic. Could mean anything.
Purpose
Runs a user-configured script or command with explicitly declared inputs and outputs. Unlike scan-based processors (which discover one product per source file), the explicit processor creates a single product with all declared inputs feeding into all declared outputs.
This is ideal for build steps that aggregate many files into one or a few outputs, such as:
- Generating an index page from all HTML files in a directory
- Building a bundle from multiple source files
- Creating a report from multiple data files
How It Works
The processor resolves all inputs (literal paths) and input_globs (glob
patterns) into a flat file list. It creates a single product with these files
as inputs and the outputs list as outputs.
Rsconstruct uses this information for:
- Rebuild detection: if any input changes, the product is rebuilt
- Dependency ordering: if an input is an output of another processor,
that processor runs first (automatic via
resolve_dependencies()) - Caching: outputs are cached and restored on cache hit
Invocation
The command is invoked as:
command [args...] --inputs <input1> <input2> ... --outputs <output1> <output2> ...
Input ordering
Inputs are passed in a deterministic order:
inputsentries first, in config file orderinput_globsresults second, one glob at a time in config file order, files within each glob sorted alphabetically
This ordering is stable across builds (assuming the same set of files exists).
Configuration
[processor.explicit.site]
command = "scripts/build_site.py"
args = ["--verbose"]
inputs = [
"resources/index.html",
"resources/index.css",
"resources/index.js",
"tags/level.txt",
"tags/category.txt",
"tags/audiences.txt",
]
input_globs = [
"docs/courses/**/*.html",
"docs/tracks/*.html",
]
outputs = [
"docs/index.html",
]
Fields
| Key | Type | Required | Description |
|---|---|---|---|
command | string | yes | Script or binary to execute |
args | array of strings | no | Extra arguments passed before --inputs |
inputs | array of strings | no | Literal input file paths |
input_globs | array of strings | no | Glob patterns resolved to input files |
outputs | array of strings | yes | Output file paths produced by the command |
At least one of inputs or input_globs must be specified.
Glob patterns
input_globs supports standard glob syntax:
*matches any sequence of characters within a path component**matches any number of path components (recursive)?matches a single character[abc]matches one of the listed characters
Glob results that match no files are silently ignored (the set of matching files may grow as upstream generators produce outputs via the fixed-point discovery loop).
Cross-Processor Dependencies
The explicit processor works naturally with the fixed-point discovery loop.
If input_globs matches files that are outputs of other processors (e.g.,
pandoc-generated HTML files), rsconstruct automatically:
- Injects those declared outputs as virtual files during discovery
- Resolves dependency edges so upstream processors run first
- Rebuilds the explicit processor when upstream outputs change
This means you do not need to manually order processors or wait for a second build — everything is handled in a single build invocation.
Comparison with Other Processor Types
| Checker | Generator | Explicit | |
|---|---|---|---|
| Products | one per input file | one per input file | one total |
| Outputs | none (pass/fail) | one per input | explicitly listed |
| Discovery | src_dirs + src_extensions | src_dirs + src_extensions | declared inputs/globs |
| Use case | lint/validate files | transform files 1:1 | aggregate many → few |
Gem Processor
Purpose
Installs Ruby dependencies from Gemfile files using Bundler.
How It Works
Discovers Gemfile files in the project, runs bundle install in each
directory, and creates a stamp file on success. Sibling .rb and .gemspec
files are tracked as inputs.
Source Files
- Input:
**/Gemfile(plus sibling.rb,.gemspecfiles) - Output:
out/gem/{flat_name}.stamp
Configuration
[processor.gem]
command = "bundle" # The bundler command to run
args = [] # Additional arguments to pass to bundler install
dep_inputs = [] # Additional files that trigger rebuilds when changed
cache_output_dir = true # Cache the vendor/bundle directory for fast restore after clean
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "bundle" | The bundler executable to run |
args | string[] | [] | Extra arguments passed to bundler install |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
cache_output_dir | boolean | true | Cache the vendor/bundle/ directory so rsconstruct clean && rsconstruct build restores from cache |
Batch support
Runs as a single whole-project operation (e.g., cargo build, npm install).
Generator Processor
Purpose
Runs a user-configured script or command as a generator, producing output files from input files. The script receives input/output path pairs on the command line.
How It Works
Discovers files matching the configured extensions, computes output paths under
output_dir with the configured output_extension, and invokes the command with
path pairs.
In single mode: command [args...] <input> <output>
In batch mode: command [args...] <input1> <output1> <input2> <output2> ...
Auto-detected when the configured scan directories contain matching files.
Source Files
- Input: files matching
src_extensionsinsrc_dirs - Output:
{output_dir}/{relative_path}.{output_extension}
Configuration
[processor.generator]
command = "scripts/convert.py"
output_dir = "out/converted"
output_extension = "html"
src_dirs = ["syllabi"]
src_extensions = [".md"]
batch = true
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "true" | Script or command to run |
output_dir | string | "out/generator" | Directory for output files |
output_extension | string | "out" | Extension for output files |
batch | bool | true | Pass all pairs in one invocation |
args | string[] | [] | Extra arguments prepended before file pairs |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Configurable via batch = true (default). In batch mode, the script receives all input/output pairs in a single invocation. Set batch = false to invoke the script once per file.
Hadolint Processor
Purpose
Lints Dockerfiles using Hadolint.
How It Works
Discovers Dockerfile files in the project (excluding common build tool
directories), runs hadolint on each file, and records success in the cache.
A non-zero exit code from hadolint fails the product.
This processor supports batch mode.
Source Files
- Input:
**/Dockerfile - Output: none (checker)
Configuration
[processor.hadolint]
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to hadolint |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
HTMLHint Processor
Purpose
Lints HTML files using HTMLHint.
How It Works
Discovers .html and .htm files in the project (excluding common build tool
directories), runs htmlhint on each file, and records success in the cache.
A non-zero exit code from htmlhint fails the product.
This processor supports batch mode.
If a .htmlhintrc file exists, it is automatically added as an extra input so
that configuration changes trigger rebuilds.
Source Files
- Input:
**/*.html,**/*.htm - Output: none (checker)
Configuration
[processor.htmlhint]
command = "htmlhint"
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "htmlhint" | The htmlhint executable to run |
args | string[] | [] | Extra arguments passed to htmlhint |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
HTMLLint Processor
Purpose
Lints HTML files using htmllint.
How It Works
Discovers .html and .htm files in the project (excluding common build tool
directories), runs htmllint on each file, and records success in the cache.
A non-zero exit code from htmllint fails the product.
This processor supports batch mode.
Source Files
- Input:
**/*.html,**/*.htm - Output: none (checker)
Configuration
[processor.htmllint]
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to htmllint |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Imarkdown2html Processor
Purpose
Converts Markdown files to HTML using the pulldown-cmark Rust crate. Native (in-process, no external tools required).
This is the native equivalent of markdown2html, which uses the external markdown Perl script.
Source Files
- Input:
**/*.md - Output:
out/imarkdown2html/{relative_path}.html
Configuration
[processor.imarkdown2html]
src_dirs = ["docs"]
output_dir = "out/imarkdown2html" # Output directory (default)
| Key | Type | Default | Description |
|---|---|---|---|
output_dir | string | "out/imarkdown2html" | Output directory for HTML files |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch Support
Each input file is processed individually, producing its own output file.
Iyamlschema Processor
Purpose
Validates YAML files against JSON schemas referenced by a $schema URL field in each file. Checks both schema conformance and property ordering. Native (in-process, no external tools required).
How It Works
For each YAML file:
- Parses the YAML content
- Reads the
$schemafield to get the schema URL - Fetches the schema (cached in
.rsconstruct/webcache.redb) - Validates the data against the schema (including resolving remote
$refreferences) - Checks that object keys appear in the order specified by
propertyOrderingfields in the schema
Fails if any file is missing $schema, fails schema validation, or has keys in the wrong order.
Configuration
[processor.iyamlschema]
src_dirs = ["yaml"]
check_ordering = true # Check propertyOrdering (default: true)
| Key | Type | Default | Description |
|---|---|---|---|
check_ordering | boolean | true | Whether to check property ordering against propertyOrdering in the schema |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Schema Requirements
Each YAML file must contain a $schema field with a URL pointing to a JSON schema:
$schema: "https://example.com/schemas/mydata.json"
name: Alice
age: 30
The schema is fetched via HTTP and cached locally. Subsequent builds use the cached version. Use rsconstruct webcache clear to force re-fetching.
Property Ordering
If the schema contains propertyOrdering arrays, the processor checks that data keys appear in the specified order:
{
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer" }
},
"propertyOrdering": ["name", "age"]
}
Set check_ordering = false to disable this check.
Batch Support
Files are validated individually within a batch. Partial failure is handled correctly.
Jekyll Processor
Purpose
Builds Jekyll static sites by running jekyll build in directories containing
a _config.yml file.
How It Works
Discovers _config.yml files in the project (excluding common build tool
directories). For each one, runs jekyll build in that directory.
Source Files
- Input:
**/_config.yml - Output: none (creator)
Configuration
[processor.jekyll]
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to jekyll build |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Runs as a single whole-project operation (e.g., cargo build, npm install).
Jinja2 Processor
Purpose
Renders Jinja2 template files into output files using the Python Jinja2 template library.
How It Works
Files matching configured extensions in templates.jinja2/ are rendered via python3 using
the jinja2 Python library. Output is written with the extension stripped and the
templates.jinja2/ prefix removed:
templates.jinja2/app.config.j2 → app.config
templates.jinja2/sub/readme.txt.j2 → sub/readme.txt
Templates use the Jinja2 templating engine. A
FileSystemLoader is configured with the project root as the search directory, so
templates can include or extend other templates using relative paths. Environment
variables are passed to the template context.
Source Files
- Input:
templates.jinja2/**/*{src_extensions} - Output: project root, mirroring the template path (minus
templates.jinja2/prefix) with the extension removed
Configuration
[processor.jinja2]
src_extensions = [".j2"] # File extensions to process (default: [".j2"])
dep_inputs = ["config/settings.py"] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
src_extensions | string[] | [".j2"] | File extensions to discover |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
Jq Processor
Purpose
Validates JSON files using jq.
How It Works
Discovers .json files in the project (excluding common build tool
directories), runs jq empty on each file, and records success in the cache.
The empty filter validates JSON syntax without producing output — a non-zero
exit code from jq fails the product.
This processor supports batch mode — multiple files are checked in a single jq invocation.
Source Files
- Input:
**/*.json - Output: none (linter)
Configuration
[processor.jq]
command = "jq" # The jq command to run
args = [] # Additional arguments to pass to jq (after "empty")
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "jq" | The jq executable to run |
args | string[] | [] | Extra arguments passed to jq (after the empty filter) |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
JSHint Processor
Purpose
Lints JavaScript files using JSHint.
How It Works
Discovers .js, .jsx, .mjs, and .cjs files in the project (excluding
common build tool directories), runs jshint on each file, and records success
in the cache. A non-zero exit code from jshint fails the product.
This processor supports batch mode.
If a .jshintrc file exists, it is automatically added as an extra input so
that configuration changes trigger rebuilds.
Source Files
- Input:
**/*.js,**/*.jsx,**/*.mjs,**/*.cjs - Output: none (checker)
Configuration
[processor.jshint]
command = "jshint"
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "jshint" | The jshint executable to run |
args | string[] | [] | Extra arguments passed to jshint |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
JSLint Processor
Purpose
Lints JavaScript files using JSLint.
How It Works
Discovers .js files in the project (excluding common build tool directories),
runs jslint on each file, and records success in the cache. A non-zero exit
code from jslint fails the product.
This processor supports batch mode.
Source Files
- Input:
**/*.js - Output: none (checker)
Configuration
[processor.jslint]
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to jslint |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Json Schema Processor
Purpose
Validates JSON schema files by checking that every object’s propertyOrdering
array exactly matches its properties keys.
How It Works
Discovers .json files in the project (excluding common build tool
directories), parses each as JSON, and recursively walks the structure. At every
object node with "type": "object", if both properties and
propertyOrdering exist, it verifies that the two key sets match exactly.
Mismatches (keys missing from propertyOrdering or extra keys in
propertyOrdering) are reported with their JSON path. Files that contain no
propertyOrdering at all pass silently.
This is a pure-Rust checker — no external tool is required.
Source Files
- Input:
**/*.json - Output: none (checker)
Configuration
[processor.json_schema]
args = [] # Reserved for future use
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Reserved for future use |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool processes one file at a time. Each file is checked in a separate invocation.
Jsonlint Processor
Purpose
Lints JSON files using jsonlint.
How It Works
Discovers .json files in the project (excluding common build tool
directories), runs jsonlint on each file, and records success in the cache.
A non-zero exit code from jsonlint fails the product.
This processor does not support batch mode — each file is checked individually.
Source Files
- Input:
**/*.json - Output: none (checker)
Configuration
[processor.jsonlint]
command = "jsonlint" # The jsonlint command to run
args = [] # Additional arguments to pass to jsonlint
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "jsonlint" | The jsonlint executable to run |
args | string[] | [] | Extra arguments passed to jsonlint |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Libreoffice Processor
Purpose
Converts LibreOffice documents (e.g., .odp presentations) to PDF or other formats.
How It Works
Discovers .odp files in the project and runs libreoffice in headless mode
to convert each file to the configured output formats. Uses flock to serialize
invocations since LibreOffice only supports a single running instance.
Source Files
- Input:
**/*.odp - Output:
out/libreoffice/{format}/{relative_path}.{format}
Configuration
[processor.libreoffice]
libreoffice_bin = "libreoffice" # The libreoffice command to run
formats = ["pdf"] # Output formats (pdf, pptx)
args = [] # Additional arguments to pass to libreoffice
output_dir = "out/libreoffice" # Output directory
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
libreoffice_bin | string | "libreoffice" | The libreoffice executable to run |
formats | string[] | ["pdf"] | Output formats to generate (pdf, pptx) |
args | string[] | [] | Extra arguments passed to libreoffice |
output_dir | string | "out/libreoffice" | Base output directory |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
Linux Module Processor
Purpose
Builds Linux kernel modules (.ko files) from source, driven by a
linux-module.yaml manifest. The processor generates a temporary Kbuild
file, invokes the kernel build system (make -C <kdir> M=<src> modules),
copies the resulting .ko to the output directory, and cleans up build
artifacts from the source tree.
How It Works
The processor scans for linux-module.yaml files. Each manifest lists one
or more kernel modules to build. For each module the processor:
- Generates a
Kbuildfile in the source directory (next to the yaml). - Runs
make -C <kdir> M=<absolute-source-dir> modulesto compile. - Copies the
.kofile toout/linux-module/<yaml-relative-dir>/. - Runs
make ... cleanand removes the generatedKbuildso the source directory stays clean.
Because the kernel build system requires M= to point at an absolute path
containing the sources and Kbuild, the make command runs in the yaml
file’s directory — not the project root.
The processor is a generator: it knows exactly which .ko files it
produces. Outputs are tracked in the build graph, cached in the object
store, and can be restored from cache after rsconstruct clean without
recompiling.
linux-module.yaml Format
All source paths are relative to the yaml file’s directory.
# Global settings (all optional)
make: make # Make binary (default: "make")
kdir: /lib/modules/6.8.0-generic/build # Kernel build dir (default: running kernel)
arch: x86_64 # ARCH= value (optional, omitted if unset)
cross_compile: x86_64-linux-gnu- # CROSS_COMPILE= value (optional)
v: 0 # Verbosity V= (default: 0)
w: 1 # Warning level W= (default: 1)
# Module definitions
modules:
- name: hello # Module name -> produces hello.ko
sources: [main.c] # Source files (relative to yaml dir)
extra_cflags: [-DDEBUG] # Extra CFLAGS (optional, becomes ccflags-y)
- name: mydriver
sources: [mydriver.c, utils.c]
Minimal Example
A single module with one source file:
modules:
- name: hello
sources: [main.c]
Output Layout
Output is placed under out/linux-module/<yaml-relative-dir>/:
out/linux-module/<yaml-dir>/
<module_name>.ko
For example, a manifest at src/kernel/hello/linux-module.yaml defining
module hello produces:
out/linux-module/src/kernel/hello/hello.ko
KDIR Detection
If kdir is not set in the manifest, the processor runs uname -r to
detect the running kernel and uses /lib/modules/<release>/build. This
requires the linux-headers-* package to be installed (e.g.,
linux-headers-generic on Ubuntu).
Generated Kbuild
The processor writes a Kbuild file with the standard kernel module
variables:
obj-m := hello.o
hello-objs := main.o
ccflags-y := -DDEBUG # only if extra_cflags is non-empty
This file is removed after building (whether the build succeeds or fails).
Configuration
[processor.linux_module]
enabled = true # Enable/disable (default: true)
dep_inputs = [] # Extra files that trigger rebuilds
Configuration Reference
| Key | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Enable/disable the processor |
dep_inputs | string[] | [] | Extra files that trigger rebuilds when changed |
src_dirs | string[] | [""] | Directory to scan for linux-module.yaml files |
src_extensions | string[] | ["linux-module.yaml"] | File patterns to scan for |
src_exclude_dirs | string[] | common excludes | Directories to skip during scanning |
Batch support
Each input file is processed individually, producing its own output file.
Caching
The .ko outputs are cached in the rsconstruct object store. After rsconstruct clean,
a subsequent rsconstruct build restores .ko files from cache (via hardlink or
copy) without invoking the kernel build system. A rebuild is triggered when
any source file or the yaml manifest changes.
Prerequisites
makemust be installed- Kernel headers must be installed for the target kernel version
(
apt install linux-headers-genericon Ubuntu) - For cross-compilation, the appropriate cross-compiler toolchain must be
available and specified via
cross_compileandarchin the manifest
Example
Given this project layout:
myproject/
rsconstruct.toml
drivers/
hello/
linux-module.yaml
main.c
With drivers/hello/linux-module.yaml:
modules:
- name: hello
sources: [main.c]
And drivers/hello/main.c:
#include <linux/module.h>
#include <linux/init.h>
MODULE_LICENSE("GPL");
static int __init hello_init(void) {
pr_info("hello: loaded\n");
return 0;
}
static void __exit hello_exit(void) {
pr_info("hello: unloaded\n");
}
module_init(hello_init);
module_exit(hello_exit);
Running rsconstruct build produces:
out/linux-module/drivers/hello/hello.ko
The module can then be loaded with sudo insmod out/linux-module/drivers/hello/hello.ko.
Luacheck Processor
Purpose
Lints Lua scripts using luacheck.
How It Works
Discovers .lua files in the project (excluding common build tool
directories), runs luacheck on each file, and records success in the cache.
A non-zero exit code from luacheck fails the product.
This processor supports batch mode, allowing multiple files to be checked in a single luacheck invocation for better performance.
Source Files
- Input:
**/*.lua - Output: none (linter)
Configuration
[processor.luacheck]
command = "luacheck" # The luacheck command to run
args = [] # Additional arguments to pass to luacheck
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "luacheck" | The luacheck executable to run |
args | string[] | [] | Extra arguments passed to luacheck |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Make Processor
Purpose
Runs make in directories containing Makefiles. Each Makefile produces a stub
file on success, allowing RSConstruct to track incremental rebuilds.
How It Works
Discovers files named Makefile in the project. For each Makefile found, the
processor runs make (or a configured alternative) in the Makefile’s directory.
A stub file is created on success.
Directory-Level Inputs
The make processor treats all files in the Makefile’s directory (and subdirectories) as inputs. This means that if any file alongside the Makefile changes — source files, headers, scripts, included makefiles — rsconstruct will re-run make.
This is slightly conservative: a change to a file that the Makefile does not actually depend on will trigger a rebuild. In practice this is the right trade-off because Makefiles can depend on arbitrary files and there is no reliable way to know which ones without running make itself.
Source Files
- Input:
**/Makefileplus all files in the Makefile’s directory tree - Output:
out/make/{relative_path}.done
Dependency Tracking Approaches
RSConstruct uses the directory-scan approach described above. Here is why, and what the alternatives are.
1. Directory scan (current)
Track every file under the Makefile’s directory as an input. Any change triggers a rebuild.
Pros: simple, correct, zero configuration. Cons: over-conservative — a change to an unrelated file in the same directory triggers a needless rebuild.
2. User-declared extra inputs
The user lists specific files or globs in dep_inputs. Only those files
(plus the Makefile itself) are tracked.
Pros: precise, no unnecessary rebuilds. Cons: requires the user to manually maintain the list. Easy to forget a file and get stale builds.
This is available today via the dep_inputs config key, but on its own
it would miss source files that the Makefile compiles.
3. Parse make --dry-run --print-data-base
Ask make to dump its dependency database and extract the real inputs.
Pros: exact dependency information, no over-building. Cons: fragile — output format varies across make implementations (GNU Make, BSD Make, nmake). Some Makefiles behave differently in dry-run mode. Complex to implement and maintain.
4. Hash the directory tree
Instead of listing individual files, compute a single hash over every file in the directory. Functionally equivalent to option 1 but with a different internal representation.
Pros: compact cache key. Cons: same over-conservatism as option 1, and no ability to report which file changed.
Configuration
[processor.make]
command = "make" # Make binary to use
args = [] # Extra arguments passed to make
target = "" # Make target (empty = default target)
src_dirs = [""] # Directory to scan ("" = project root)
src_extensions = ["Makefile"]
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "make" | Path or name of the make binary |
args | string[] | [] | Extra arguments passed to every make invocation |
target | string | "" | Make target to build (empty = default target) |
src_dirs | string[] | [""] | Directory to scan for Makefiles |
src_extensions | string[] | ["Makefile"] | File names to match |
src_exclude_paths | string[] | [] | Paths (relative to project root) to exclude |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds (in addition to directory contents) |
Batch support
The tool processes one file at a time. Each file is checked in a separate invocation.
Mako Processor
Purpose
Renders Mako template files into output files using the Python Mako template library.
How It Works
Files matching configured extensions in templates.mako/ are rendered via python3 using
the mako Python library. Output is written with the extension stripped and the
templates.mako/ prefix removed:
templates.mako/app.config.mako → app.config
templates.mako/sub/readme.txt.mako → sub/readme.txt
Templates use the Mako templating engine. A
TemplateLookup is configured with the project root as the lookup directory, so
templates can include or inherit from other templates using relative paths.
Source Files
- Input:
templates.mako/**/*{src_extensions} - Output: project root, mirroring the template path (minus
templates.mako/prefix) with the extension removed
Configuration
[processor.mako]
src_extensions = [".mako"] # File extensions to process (default: [".mako"])
dep_inputs = ["config/settings.py"] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
src_extensions | string[] | [".mako"] | File extensions to discover |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
Markdown2html Processor
Purpose
Converts Markdown files to HTML using the markdown Perl script.
How It Works
Discovers .md files in the project and runs markdown on each file,
producing an HTML output file.
Source Files
- Input:
**/*.md - Output:
out/markdown2html/{relative_path}.html
Configuration
[processor.markdown2html]
markdown_bin = "markdown" # The markdown command to run
args = [] # Additional arguments to pass to markdown
output_dir = "out/markdown2html" # Output directory
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
markdown_bin | string | "markdown" | The markdown executable to run |
args | string[] | [] | Extra arguments passed to markdown |
output_dir | string | "out/markdown2html" | Output directory for HTML files |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
MassGenerator Processor
Status
Designed, not yet implemented. This document describes the intended user-facing contract for the MassGenerator processor type. The full design rationale is in Output Prediction.
Why “mass generator”?
Existing processor types cover a matrix of “how many outputs” and “are they known in advance”:
| Type | Outputs known? | Example |
|---|---|---|
| Generator | Yes, 1 per input | tera: template → file |
| Explicit | Yes, user-declared | custom build step |
| Checker | None (pass/fail) | ruff |
| Creator | No, opaque (output_dirs) | mkdocs → _site/ |
| MassGenerator | Yes — tool enumerates them | rssite → _site/* |
MassGenerator is the “transparent Creator”: it produces many output files (like a Creator), but the tool itself answers the question “what will you produce?” before running. Each predicted file becomes a declared product with its own inputs, cache entry, and dependency edges.
Names considered:
- mass_generator — chosen. Says what it does: “generator” (per-file outputs like the Generator type), “mass” (many products from one tool invocation).
- transparent_creator — accurate but awkward.
- predicting_creator — describes the mechanism, not the result.
- site_generator — too narrow; the type is useful beyond static sites.
Purpose
Wraps a tool that:
- Produces many output files from a set of source files (e.g., a static site generator).
- Can enumerate its outputs in advance via a separate “plan” command.
- Normally builds all its outputs in a single invocation.
Once wired as a MassGenerator, the tool gets per-file cache entries, plays cleanly with other processors sharing its output directory, and allows downstream processors to depend on its outputs.
How it works
1. The tool provides two modes
The wrapped tool must expose:
- Build mode: runs the actual generation. Produces all output files in one invocation.
- Plan mode: prints a JSON manifest to stdout listing every output it will produce, with per-output source dependencies. Does not produce any output files.
Both modes must be driven by the same internal function that enumerates outputs — otherwise the plan and the build diverge, and the cache is corrupted. This is a discipline the tool author upholds.
2. Plan phase (at graph-build time)
rsconstruct runs predict_command and parses its output. For each entry in the manifest, a product is added to the build graph with:
inputs= the entry’ssources(files whose changes should trigger this output’s rebuild)outputs=[entry.path]processor= the MassGenerator instance name
3. Build phase
rsconstruct groups all dirty products for a MassGenerator instance into a single batch. The tool’s command is invoked once per batch; it produces all predicted files. Each product caches its own file as a blob, independently of the others.
In strict mode (default), after the tool exits rsconstruct verifies that every predicted file was produced and no unexpected files appeared in output_dirs. Mismatches are build-breaking errors.
4. Restore phase
When all products for a MassGenerator instance are clean, each is restored from its blob cache — the tool is not invoked at all. Partial cleanliness (some products clean, some dirty) triggers a single tool invocation, and clean products are cached/re-cached afterward.
Manifest format
{
"version": 1,
"outputs": [
{
"path": "_site/index.html",
"sources": ["docs/index.md", "templates/default.html", "mysite.toml"]
},
{
"path": "_site/about/index.html",
"sources": ["docs/about.md", "templates/default.html", "mysite.toml"]
}
]
}
version— integer. Schema version. Current:1.outputs[].path— relative path. Must fall within one of the processor’soutput_dirs.outputs[].sources— minimal set of input files whose changes invalidate this output.
Configuration
[processor.mass_generator.site]
command = "rssite build"
predict_command = "rssite plan"
output_dirs = ["_site"]
src_dirs = ["docs", "templates"]
src_extensions = [".md", ".html", ".yaml"]
# loose_manifest = false # optional; set to true to downgrade verification mismatches to warnings
Fields
| Key | Type | Required | Description |
|---|---|---|---|
command | string | yes | Tool’s build command. Invoked once per batch of dirty products. |
predict_command | string | yes | Tool’s plan command. Must print JSON manifest to stdout. |
output_dirs | array of strings | yes | Directories the tool produces files in. Used for verification. |
loose_manifest | bool | no | Default false. If true, plan/actual mismatches are warnings only. |
src_dirs | array of strings | no | Bound which source changes trigger a replan. |
src_extensions | array of strings | no | As above. |
src_exclude_* | array of strings | no | Standard scan exclusions apply. |
dep_inputs | array of strings | no | Extra files that invalidate the whole instance when changed. |
Cross-processor dependencies
Because every output file is a declared product, downstream processors wire up naturally:
[processor.mass_generator.site]
command = "rssite build"
predict_command = "rssite plan"
output_dirs = ["_site"]
[processor.markdownlint]
# Depends on rssite's outputs automatically via file-scan:
# any _site/*.html file is a discovered virtual file in the graph.
src_dirs = ["_site"]
src_extensions = [".html"]
No ordering hacks needed. The graph’s topological sort handles it.
Tool author contract
For a tool to be compatible with MassGenerator, its plan command must uphold these invariants:
- Pure function of config + source tree. Same inputs → same manifest, bit for bit. No network, no timestamps, no env-var peeking (unless declared as a source).
- Cheap or cached. rsconstruct invokes it on every graph build. Slow plan → slow rsconstruct.
- Exact match with build output. Predicted paths must equal actual paths produced by
command. Violations are errors in strict mode. - Deterministic variable outputs. Content-derived outputs (tag pages, archive indices, RSS) must be enumerable from the same parsing pass that plan does.
See rssite for a reference tool being built to this contract.
Comparison with other processor types
| Creator (opaque) | MassGenerator (transparent) | Generator (1:1) | |
|---|---|---|---|
| Outputs known in advance? | No | Yes | Yes |
| Tool invocations per build | 1 if dirty | 1 if any product is dirty | N (one per dirty input) |
| Cache unit | Whole tree | Per file | Per file |
| Downstream deps | Only on declared files | On every predicted file | On every produced file |
| Shared-folder safety | Via path_owner filter | Via declared outputs (normal) | Via declared outputs |
| Use case | mkdocs, Sphinx | rssite, cooperative tools | tera, mako, compilers |
Migration story
If a tool exists first as a Creator (output_dirs only) and later adds plan support, the migration is config-only:
# Before
[processor.creator.mysite]
command = "mysite build"
output_dirs = ["_site"]
# After
[processor.mass_generator.mysite]
command = "mysite build"
predict_command = "mysite plan"
output_dirs = ["_site"]
No code changes; existing downstream processors start getting precise dependencies automatically.
See also
- Output Prediction — full design rationale, invariants, execution shape
- Shared Output Directory — the fallback mechanism for opaque Creators
- Processor Ordering — sibling discussion about explicit ordering knobs
- rssite — a static site generator being built to implement this contract
Markdownlint Processor
Purpose
Lints Markdown files using markdownlint (Node.js).
How It Works
Discovers .md files in the project and runs markdownlint on each file. A
non-zero exit code fails the product.
Depends on the npm processor — uses the markdownlint binary installed by npm.
Source Files
- Input:
**/*.md - Output: none (checker)
Configuration
[processor.markdownlint]
command = "node_modules/.bin/markdownlint" # Path to the markdownlint binary
args = [] # Additional arguments to pass to markdownlint
npm_stamp = "out/npm/root.stamp" # Stamp file from npm processor (dependency)
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "node_modules/.bin/markdownlint" | Path to the markdownlint executable |
args | string[] | [] | Extra arguments passed to markdownlint |
npm_stamp | string | "out/npm/root.stamp" | Stamp file from npm processor (ensures npm packages are installed first) |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool processes one file at a time. Each file is checked in a separate invocation.
Marp Processor
Purpose
Converts Markdown slides to PDF, PPTX, or HTML using Marp.
How It Works
Discovers .md files in the project and runs marp on each file, generating
output in the configured formats. Each format produces a separate output file.
Each marp invocation spawns a headless Chromium browser instance via Puppeteer to render the slides. This makes marp significantly more resource-intensive than typical processors — see Concurrency limiting below.
Source Files
- Input:
**/*.md - Output:
out/marp/{format}/{relative_path}.{format}
Configuration
[processor.marp]
marp_bin = "marp" # The marp command to run
formats = ["pdf"] # Output formats (pdf, pptx, html)
args = ["--html", "--allow-local-files"] # Additional arguments to pass to marp
output_dir = "out/marp" # Output directory
dep_inputs = [] # Additional files that trigger rebuilds when changed
max_jobs = 2 # Limit concurrent marp instances (each spawns Chromium)
| Key | Type | Default | Description |
|---|---|---|---|
marp_bin | string | "marp" | The marp executable to run |
formats | string[] | ["pdf"] | Output formats to generate (pdf, pptx, html) |
args | string[] | ["--html", "--allow-local-files"] | Extra arguments passed to marp |
output_dir | string | "out/marp" | Base output directory |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
max_jobs | integer | none | Max concurrent marp processes. See Concurrency limiting. |
Concurrency Limiting
Each marp invocation launches a full headless Chromium browser process, which
consumes hundreds of megabytes of RAM. When running parallel builds with -j N,
too many simultaneous Chromium instances cause resource exhaustion and
non-deterministic crashes:
TargetCloseError: Protocol error (Target.setDiscoverTargets): Target closed
Use max_jobs to limit how many marp processes run concurrently, independent of
the global -j setting. For example, with -j 20 and max_jobs = 2, at most
2 Chromium instances will be alive at once while other processors still use the
full 20 threads:
[processor.marp]
formats = ["pdf"]
max_jobs = 2
Recommended value: 2. A value of 4 may work on machines with plenty of RAM
but has been observed to produce occasional failures on large projects (700+ slides).
Without max_jobs, the global -j value applies, which typically causes crashes
at higher parallelism levels.
Batch Support
Each input file is processed individually, producing its own output file.
Temporary Files
Marp creates temporary Chromium profile directories (marp-cli-*) in /tmp for
each invocation. RSConstruct automatically cleans these up after each marp process
completes, since marp itself does not delete them.
Mdbook Processor
Purpose
Builds mdbook documentation projects.
How It Works
Discovers book.toml files indicating mdbook projects, collects sibling .md
and .toml files as inputs, and runs mdbook build. A non-zero exit code
fails the product.
Source Files
- Input:
**/book.toml(plus sibling.md,.tomlfiles) - Output: none (creator — produces output in
bookdirectory)
Configuration
[processor.mdbook]
command = "mdbook" # The mdbook command to run
output_dir = "book" # Output directory for generated docs
args = [] # Additional arguments to pass to mdbook
dep_inputs = [] # Additional files that trigger rebuilds when changed
cache_output_dir = true # Cache the output directory for fast restore after clean
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "mdbook" | The mdbook executable to run |
output_dir | string | "book" | Output directory for generated documentation |
args | string[] | [] | Extra arguments passed to mdbook |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
cache_output_dir | boolean | true | Cache the book/ directory so rsconstruct clean && rsconstruct build restores from cache |
Batch support
Runs as a single whole-project operation (e.g., cargo build, npm install).
Mdl Processor
Purpose
Lints Markdown files using mdl (Ruby markdownlint).
How It Works
Discovers .md files in the project and runs mdl on each file. A non-zero
exit code fails the product.
Depends on the gem processor — uses the mdl binary installed by Bundler.
Source Files
- Input:
**/*.md - Output: none (checker)
Configuration
[processor.mdl]
gem_home = "gems" # GEM_HOME directory
command = "gems/bin/mdl" # Path to the mdl binary
args = [] # Additional arguments to pass to mdl
gem_stamp = "out/gem/root.stamp" # Stamp file from gem processor (dependency)
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
gem_home | string | "gems" | GEM_HOME directory for Ruby gems |
command | string | "gems/bin/mdl" | Path to the mdl executable |
args | string[] | [] | Extra arguments passed to mdl |
gem_stamp | string | "out/gem/root.stamp" | Stamp file from gem processor (ensures gems are installed first) |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool processes one file at a time. Each file is checked in a separate invocation.
Mermaid Processor
Purpose
Converts Mermaid diagram files to PNG, SVG, or PDF using mmdc (mermaid-cli).
How It Works
Discovers .mmd files in the project and runs mmdc on each file, generating
output in the configured formats. Each format produces a separate output file.
Source Files
- Input:
**/*.mmd - Output:
out/mermaid/{format}/{relative_path}.{format}
Configuration
[processor.mermaid]
mmdc_bin = "mmdc" # The mmdc command to run
formats = ["png"] # Output formats (png, svg, pdf)
args = [] # Additional arguments to pass to mmdc
output_dir = "out/mermaid" # Output directory
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
mmdc_bin | string | "mmdc" | The mermaid-cli executable to run |
formats | string[] | ["png"] | Output formats to generate (png, svg, pdf) |
args | string[] | [] | Extra arguments passed to mmdc |
output_dir | string | "out/mermaid" | Base output directory |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
Mypy Processor
Purpose
Type-checks Python source files using mypy.
How It Works
Discovers .py files in the project (excluding common non-source directories),
runs mypy on each file, and creates a stub file on success.
A non-zero exit code from mypy fails the product.
This processor supports batch mode, allowing multiple files to be checked in a single mypy invocation for better performance.
If a mypy.ini file exists in the project root, it is automatically added as an
extra input so that configuration changes trigger rebuilds.
Source Files
- Input:
**/*.py - Output:
out/mypy/{flat_name}.mypy
Configuration
[processor.mypy]
command = "mypy" # The mypy command to run
args = [] # Additional arguments to pass to mypy
dep_inputs = [] # Additional files that trigger rebuilds (e.g. ["pyproject.toml"])
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "mypy" | The mypy executable to run |
args | string[] | [] | Extra arguments passed to mypy |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Using mypy.ini
Mypy automatically reads configuration from a mypy.ini file in the project
root. This file is detected automatically and added as an extra input, so
changes to it will trigger rebuilds without manual configuration.
Npm Processor
Purpose
Installs Node.js dependencies from package.json files using npm.
How It Works
Discovers package.json files in the project, runs npm install in each
directory, and creates a stamp file on success. Sibling .json, .js, and
.ts files are tracked as inputs so changes trigger reinstallation.
Source Files
- Input:
**/package.json(plus sibling.json,.js,.tsfiles) - Output:
out/npm/{flat_name}.stamp
Configuration
[processor.npm]
command = "npm" # The npm command to run
args = [] # Additional arguments to pass to npm install
dep_inputs = [] # Additional files that trigger rebuilds when changed
cache_output_dir = true # Cache the node_modules directory for fast restore after clean
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "npm" | The npm executable to run |
args | string[] | [] | Extra arguments passed to npm install |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
cache_output_dir | boolean | true | Cache the node_modules/ directory so rsconstruct clean && rsconstruct build restores from cache |
Batch support
Runs as a single whole-project operation (e.g., cargo build, npm install).
Objdump Processor
Purpose
Disassembles ELF binaries using objdump.
How It Works
Discovers .elf files under out/cc_single_file/, runs objdump to produce
disassembly output, and writes the result to the configured output directory.
Source Files
- Input:
out/cc_single_file/**/*.elf - Output: disassembly files in output directory
Configuration
[processor.objdump]
args = []
dep_inputs = []
output_dir = "out/objdump"
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to objdump |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
output_dir | string | "out/objdump" | Directory for disassembly output |
Batch support
Each input file is processed individually, producing its own output file.
Pandoc Processor
Purpose
Converts documents between formats using pandoc.
How It Works
Discovers .md files in the project and runs pandoc on each file, converting
from the configured source format to the configured output formats.
Source Files
- Input:
**/*.md - Output:
out/pandoc/{format}/{relative_path}.{format}
Configuration
[processor.pandoc]
pandoc = "pandoc" # The pandoc command to run
from = "markdown" # Source format
formats = ["pdf"] # Output formats (pdf, docx, html, etc.)
args = [] # Additional arguments to pass to pandoc
output_dir = "out/pandoc" # Output directory
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
pandoc | string | "pandoc" | The pandoc executable to run |
from | string | "markdown" | Source format |
formats | string[] | ["pdf"] | Output formats to generate |
args | string[] | [] | Extra arguments passed to pandoc |
output_dir | string | "out/pandoc" | Base output directory |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
Pdflatex Processor
Purpose
Compiles LaTeX documents to PDF using pdflatex.
How It Works
Discovers .tex files in the project and runs pdflatex on each file. Runs
multiple compilation passes (configurable) to resolve cross-references and
table of contents. Optionally uses qpdf to linearize the output PDF.
Source Files
- Input:
**/*.tex - Output:
out/pdflatex/{relative_path}.pdf
Configuration
[processor.pdflatex]
command = "pdflatex" # The pdflatex command to run
runs = 2 # Number of compilation passes
qpdf = true # Use qpdf to linearize output PDF
args = [] # Additional arguments to pass to pdflatex
output_dir = "out/pdflatex" # Output directory
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "pdflatex" | The pdflatex executable to run |
runs | integer | 2 | Number of compilation passes (for cross-references) |
qpdf | bool | true | Use qpdf to linearize the output PDF |
args | string[] | [] | Extra arguments passed to pdflatex |
output_dir | string | "out/pdflatex" | Output directory for PDF files |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
Pdfunite Processor
Purpose
Merges PDF files from subdirectories into single combined PDFs using pdfunite.
How It Works
Scans subdirectories of the configured source directory for files matching the configured extension. For each subdirectory, it locates the corresponding PDFs (generated by an upstream processor such as marp) and merges them into a single output PDF.
This processor is designed for course/module workflows where slide decks in subdirectories are combined into course bundles.
Source Files
- Input: PDFs from upstream processor (e.g.,
out/marp/pdf/{subdir}/*.pdf) - Output:
out/courses/{subdir}.pdf
Configuration
[processor.pdfunite]
command = "pdfunite" # The pdfunite command to run
source_dir = "marp/courses" # Base directory containing course subdirectories
source_ext = ".md" # Extension of source files in subdirectories
source_output_dir = "out/marp/pdf" # Where the upstream processor puts PDFs
args = [] # Additional arguments to pass to pdfunite
output_dir = "out/courses" # Output directory for merged PDFs
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "pdfunite" | The pdfunite executable to run |
source_dir | string | "marp/courses" | Directory containing course subdirectories |
source_ext | string | ".md" | Extension of source files to look for |
source_output_dir | string | "out/marp/pdf" | Directory where the upstream processor outputs PDFs |
args | string[] | [] | Extra arguments passed to pdfunite |
output_dir | string | "out/courses" | Output directory for merged PDFs |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
Perlcritic Processor
Purpose
Analyzes Perl code using Perl::Critic.
How It Works
Discovers .pl and .pm files in the project (excluding common build tool
directories), runs perlcritic on each file, and records success in the cache.
A non-zero exit code from perlcritic fails the product.
This processor supports batch mode.
If a .perlcriticrc file exists, it is automatically added as an extra input so
that configuration changes trigger rebuilds.
Source Files
- Input:
**/*.pl,**/*.pm - Output: none (checker)
Configuration
[processor.perlcritic]
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to perlcritic |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
PHP Lint Processor
Purpose
Checks PHP syntax using php -l.
How It Works
Discovers .php files in the project (excluding common build tool directories),
runs php -l on each file, and records success in the cache. A non-zero exit
code fails the product.
This processor supports batch mode.
Source Files
- Input:
**/*.php - Output: none (checker)
Configuration
[processor.php_lint]
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to php |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Pip Processor
Purpose
Installs Python dependencies from requirements.txt files using pip.
How It Works
Discovers requirements.txt files in the project, runs pip install -r on
each, and creates a stamp file on success. The stamp file tracks the install
state so dependencies are only reinstalled when requirements.txt changes.
Source Files
- Input:
**/requirements.txt - Output:
out/pip/{flat_name}.stamp
Configuration
[processor.pip]
command = "pip" # The pip command to run
args = [] # Additional arguments to pass to pip
dep_inputs = [] # Additional files that trigger rebuilds when changed
cache_output_dir = true # Cache the stamp directory for fast restore after clean
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "pip" | The pip executable to run |
args | string[] | [] | Extra arguments passed to pip |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
cache_output_dir | boolean | true | Cache the out/pip/ directory so rsconstruct clean && rsconstruct build restores from cache |
Batch support
Runs as a single whole-project operation (e.g., cargo build, npm install).
Protobuf Processor
Purpose
Compiles Protocol Buffer (.proto) files to generated source code using protoc.
How It Works
Files matching configured extensions in the proto/ directory are compiled using the
Protocol Buffer compiler. Output is written to out/protobuf/:
proto/hello.proto → out/protobuf/hello.pb.cc
The --proto_path is automatically set to the parent directory of each input file.
Source Files
- Input:
proto/**/*.proto - Output:
out/protobuf/with.pb.ccextension
Configuration
[processor.protobuf]
protoc_bin = "protoc" # Protoc binary (default: "protoc")
src_extensions = [".proto"] # File extensions to process
output_dir = "out/protobuf" # Output directory (default: "out/protobuf")
dep_inputs = [] # Additional files that trigger rebuilds
| Key | Type | Default | Description |
|---|---|---|---|
protoc_bin | string | "protoc" | Path to protoc compiler |
src_extensions | string[] | [".proto"] | File extensions to discover |
output_dir | string | "out/protobuf" | Output directory |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
Pylint Processor
Purpose
Lints Python source files using pylint.
How It Works
Discovers .py files in the project (excluding common non-source directories),
runs pylint on each file, and creates a stub file on success.
A non-zero exit code from pylint fails the product.
This processor supports batch mode, allowing multiple files to be checked in a single pylint invocation for better performance.
If a .pylintrc file exists in the project root, it is automatically added as an
extra input so that configuration changes trigger rebuilds.
Source Files
- Input:
**/*.py - Output:
out/pylint/{flat_name}.pylint
Configuration
[processor.pylint]
args = [] # Additional arguments to pass to pylint
dep_inputs = [] # Additional files that trigger rebuilds (e.g. ["pyproject.toml"])
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to pylint |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Pyrefly Processor
Purpose
Type-checks Python source files using pyrefly.
How It Works
Discovers .py files in the project (excluding common non-source directories),
runs pyrefly check on each file, and records success in the cache.
A non-zero exit code from pyrefly fails the product.
This processor supports batch mode, allowing multiple files to be checked in a single pyrefly invocation for better performance.
Source Files
- Input:
**/*.py - Output: none (linter)
Configuration
[processor.pyrefly]
command = "pyrefly" # The pyrefly command to run
args = [] # Additional arguments to pass to pyrefly
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "pyrefly" | The pyrefly executable to run |
args | string[] | [] | Extra arguments passed to pyrefly |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Pytest Processor
Purpose
Runs Python test files using pytest to verify they pass.
How It Works
Python test files (.py) in the tests/ directory are run using pytest.
Each test file is checked individually — a failing test causes the build to fail.
Source Files
- Input:
tests/**/*.py - Output: none (checker — pass/fail only)
Configuration
[processor.pytest]
src_extensions = [".py"] # File extensions to process (default: [".py"])
src_dirs = ["tests"] # Directories to scan (default: ["tests"])
dep_inputs = [] # Additional files that trigger rebuilds
| Key | Type | Default | Description |
|---|---|---|---|
src_extensions | string[] | [".py"] | File extensions to discover |
src_dirs | string[] | ["tests"] | Directories to scan for test files |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Requirements Processor
Purpose
Generates a requirements.txt file for a Python project by scanning the
project’s .py source files for import statements and listing the
third-party PyPI distributions they reference.
How It Works
- Scans every
.pyfile in the project’s source directories. - Extracts the top-level module name from each
import/fromstatement. - Drops imports that resolve to a local project file (intra-project imports).
- Drops imports that are part of the Python standard library.
- Drops imports listed in
exclude. - Maps each remaining import name to its PyPI distribution name using the
built-in curated table (e.g.
cv2→opencv-python,yaml→PyYAML). User-suppliedmappingentries win over the built-in table. - Writes the deduplicated result to
requirements.txt.
Import → Distribution Mapping
Most Python packages publish under the same name as their top-level import,
so the default is identity (import requests → requests). A curated table
handles the common exceptions:
| Import | Distribution |
|---|---|
cv2 | opencv-python |
yaml | PyYAML |
PIL | Pillow |
sklearn | scikit-learn |
bs4 | beautifulsoup4 |
dateutil | python-dateutil |
dotenv | python-dotenv |
jwt | PyJWT |
Projects that import an unusual name should add an override:
[processor.requirements.mapping]
internal_tools = "acme-internal-tools"
Limitations
- No version pinning. The generated file lists bare distribution names.
Running
pip freeze > requirements.txtis the right tool if you need pinned versions. - Static analysis only. Conditional imports inside
tryblocks, runtime__import__calls, and string-based imports are not detected. - Curated mapping is finite. Packages with import/distribution name
mismatches not in the built-in table default to identity; add them to
mappingwhen needed.
Source Files
- Input:
**/*.py(configurable viasrc_dirs/src_extensions) - Output:
requirements.txt(configurable viaoutput)
Configuration
[processor.requirements]
output = "requirements.txt" # Output file path
exclude = [] # Import names to never emit
sorted = true # Sort entries alphabetically
header = true # Include a "# Generated by rsconstruct" header
[processor.requirements.mapping]
# Per-project overrides: import_name = "pypi-distribution-name"
# These win over the built-in curated table.
| Key | Type | Default | Description |
|---|---|---|---|
output | string | "requirements.txt" | Output file path |
exclude | string[] | [] | Import names to never emit |
sorted | bool | true | Sort entries alphabetically (false preserves first-seen order) |
header | bool | true | Include a comment header line |
mapping | map | {} | Per-project import→distribution overrides |
Batch support
Runs as a single whole-project operation — all .py files feed into one
requirements.txt output.
Ruff Processor
Purpose
Lints Python source files using ruff.
How It Works
Discovers .py files in the project (excluding common non-source directories),
runs ruff check on each file, and creates a stub file on success.
A non-zero exit code from ruff fails the product.
This processor supports batch mode, allowing multiple files to be checked in a single ruff invocation for better performance.
Source Files
- Input:
**/*.py - Output:
out/ruff/{flat_name}.ruff
Configuration
[processor.ruff]
command = "ruff" # The ruff command to run
args = [] # Additional arguments to pass to ruff
dep_inputs = [] # Additional files that trigger rebuilds (e.g. ["pyproject.toml"])
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "ruff" | The ruff executable to run |
args | string[] | [] | Extra arguments passed to ruff |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Rumdl Processor
Purpose
Lints Markdown files using rumdl.
How It Works
Discovers .md files in the project (excluding common non-source directories),
runs rumdl check on each file, and creates a stub file on success.
A non-zero exit code from rumdl fails the product.
This processor supports batch mode, allowing multiple files to be checked in a single rumdl invocation for better performance.
Source Files
- Input:
**/*.md - Output:
out/rumdl/{flat_name}.rumdl
Configuration
[processor.rumdl]
command = "rumdl" # The rumdl command to run
args = [] # Additional arguments to pass to rumdl
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "rumdl" | The rumdl executable to run |
args | string[] | [] | Extra arguments passed to rumdl |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Rust Single File Processor
Purpose
Compiles single-file Rust programs (.rs) into executables, similar to the cc_single_file
processor but for Rust.
How It Works
Rust source files in the src/ directory are compiled directly to executables using rustc.
This is useful for exercise, example, or utility repositories where each .rs file is a
standalone program.
Output is written to out/rust_single_file/ preserving the directory structure:
src/hello.rs → out/rust_single_file/hello.elf
src/exercises/ex1.rs → out/rust_single_file/exercises/ex1.elf
Source Files
- Input:
src/**/*.rs - Output:
out/rust_single_file/with configured suffix (default:.elf)
Configuration
[processor.rust_single_file]
command = "rustc" # Rust compiler (default: "rustc")
flags = [] # Additional compiler flags
output_suffix = ".elf" # Output file suffix (default: ".elf")
output_dir = "out/rust_single_file" # Output directory
dep_inputs = [] # Additional files that trigger rebuilds
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "rustc" | Path to Rust compiler |
flags | string[] | [] | Additional compiler flags |
output_suffix | string | ".elf" | Suffix for output executables |
output_dir | string | "out/rust_single_file" | Output directory |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
Sass Processor
Purpose
Compiles SCSS and SASS files into CSS using the Sass compiler.
How It Works
Files matching configured extensions in the sass/ directory are compiled to CSS.
Output is written to out/sass/ preserving the directory structure:
sass/style.scss -> out/sass/style.css
sass/components/button.scss -> out/sass/components/button.css
Source Files
- Input:
sass/**/*{src_extensions} - Output:
out/sass/mirroring the source structure with.cssextension
Configuration
[processor.sass]
sass_bin = "sass" # Sass compiler binary (default: "sass")
src_extensions = [".scss", ".sass"] # File extensions to process
output_dir = "out/sass" # Output directory (default: "out/sass")
dep_inputs = [] # Additional files that trigger rebuilds
| Key | Type | Default | Description |
|---|---|---|---|
sass_bin | string | "sass" | Path to sass compiler |
src_extensions | string[] | [".scss", ".sass"] | File extensions to discover |
output_dir | string | "out/sass" | Output directory |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
Script Processor
Purpose
Runs a user-configured script or command as a linter on discovered files. This is a generic linter that lets you plug in any script without writing a custom processor.
How It Works
Discovers files matching the configured extensions in the configured scan directory, then runs the configured linter command on each file (or batch of files). A non-zero exit code from the script fails the product.
This processor is disabled by default — you must set enabled = true and
provide a command in your rsconstruct.toml.
This processor supports batch mode, allowing multiple files to be checked in a single invocation for better performance.
Source Files
- Input: configured via
src_extensionsandsrc_dirs - Output: none (checker)
Configuration
[processor.script]
enabled = true
command = "python"
args = ["scripts/md_lint.py", "-q"]
src_extensions = [".md"]
src_dirs = ["marp"]
| Key | Type | Default | Description |
|---|---|---|---|
enabled | bool | false | Must be set to true to activate |
command | string | (required) | The command to run |
args | string[] | [] | Extra arguments passed before file paths |
src_extensions | string[] | [] | File extensions to scan for |
src_dirs | string[] | [""] | Directory to scan (empty = project root) |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
dep_auto | string[] | [] | Auto-detected input files |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Shellcheck Processor
Purpose
Lints shell scripts using shellcheck.
How It Works
Discovers .sh and .bash files in the project (excluding common build tool
directories), runs shellcheck on each file, and records success in the cache.
A non-zero exit code from shellcheck fails the product.
This processor supports batch mode, allowing multiple files to be checked in a single shellcheck invocation for better performance.
Source Files
- Input:
**/*.sh,**/*.bash - Output: none (linter)
Configuration
[processor.shellcheck]
command = "shellcheck" # The shellcheck command to run
args = [] # Additional arguments to pass to shellcheck
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "shellcheck" | The shellcheck executable to run |
args | string[] | [] | Extra arguments passed to shellcheck |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Slidev Processor
Purpose
Builds Slidev presentations.
How It Works
Discovers .md files in the project (excluding common build tool directories),
runs slidev build on each file, and records success in the cache. A non-zero
exit code from slidev fails the product.
This processor supports batch mode.
Source Files
- Input:
**/*.md - Output: none (checker)
Configuration
[processor.slidev]
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to slidev build |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Zspell Processor
Purpose
Checks documentation files for spelling errors using Hunspell-compatible
dictionaries (via the zspell crate, pure Rust).
How It Works
Discovers files matching the configured extensions, extracts words from markdown content (stripping code blocks, inline code, URLs, and HTML tags), and checks each word against the system Hunspell dictionary and a custom words file (if it exists). Fails with a list of misspelled words on error.
Dictionaries are read from /usr/share/hunspell/.
This processor supports batch mode when auto_add_words is enabled, collecting
all misspelled words across files and writing them to the words file at the end.
Source Files
- Input:
**/*{src_extensions}(default:**/*.md) - Output: none (checker)
Custom Words File
The processor loads custom words from the file specified by words_file
(default: .zspell-words) if the file exists. Format: one word per line,
# comments supported, blank lines ignored.
The words file is also auto-detected as an input via dep_auto, so changes
to it invalidate all zspell products. To disable words file detection, set
dep_auto = [].
Configuration
[processor.zspell]
src_extensions = [".md"] # File extensions to check (default: [".md"])
language = "en_US" # Hunspell dictionary language (default: "en_US")
words_file = ".zspell-words" # Path to custom words file (default: ".zspell-words")
auto_add_words = false # Auto-add misspelled words to words_file (default: false)
dep_auto = [".zspell-words"] # Auto-detected config files (default: [".zspell-words"])
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
src_extensions | string[] | [".md"] | File extensions to discover |
language | string | "en_US" | Hunspell dictionary language (requires system package) |
words_file | string | ".zspell-words" | Path to custom words file (relative to project root) |
auto_add_words | bool | false | Auto-add misspelled words to words_file instead of failing (also available as --auto-add-words CLI flag) |
dep_auto | string[] | [".zspell-words"] | Config files auto-detected as inputs |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Sphinx Processor
Purpose
Builds Sphinx documentation projects.
How It Works
Discovers conf.py files indicating Sphinx projects, collects sibling .rst,
.py, and .md files as inputs, and runs sphinx-build to generate output.
A non-zero exit code fails the product.
Source Files
- Input:
**/conf.py(plus sibling.rst,.py,.mdfiles) - Output: none (creator — produces output in
_builddirectory)
Configuration
[processor.sphinx]
command = "sphinx-build" # The sphinx-build command to run
output_dir = "_build" # Output directory for generated docs
args = [] # Additional arguments to pass to sphinx-build
dep_inputs = [] # Additional files that trigger rebuilds when changed
cache_output_dir = true # Cache the output directory for fast restore after clean
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "sphinx-build" | The sphinx-build executable to run |
output_dir | string | "_build" | Output directory for generated documentation |
args | string[] | [] | Extra arguments passed to sphinx-build |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
cache_output_dir | boolean | true | Cache the _build/ directory so rsconstruct clean && rsconstruct build restores from cache |
Batch support
Runs as a single whole-project operation (e.g., cargo build, npm install).
Standard Processor
Purpose
Checks JavaScript code style using standard.
How It Works
Discovers .js files in the project (excluding common build tool directories),
runs standard on each file, and records success in the cache. A non-zero exit
code from standard fails the product.
This processor supports batch mode.
Source Files
- Input:
**/*.js - Output: none (checker)
Configuration
[processor.standard]
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to standard |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Stylelint Processor
Purpose
Lints CSS, SCSS, Sass, and Less files using stylelint.
How It Works
Discovers .css, .scss, .sass, and .less files in the project (excluding
common build tool directories), runs stylelint on each file, and records success
in the cache. A non-zero exit code from stylelint fails the product.
This processor supports batch mode.
If a stylelint config file exists (.stylelintrc* or stylelint.config.*), it
is automatically added as an extra input so that configuration changes trigger
rebuilds.
Source Files
- Input:
**/*.css,**/*.scss,**/*.sass,**/*.less - Output: none (checker)
Configuration
[processor.stylelint]
command = "stylelint"
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "stylelint" | The stylelint executable to run |
args | string[] | [] | Extra arguments passed to stylelint |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Tags Processor
Purpose
Extracts YAML frontmatter tags from markdown files into a searchable database with comprehensive validation.
How It Works
Scans .md files for YAML frontmatter blocks (delimited by ---), parses tag
metadata, and builds a redb database. The
database enables querying files by tags via rsconstruct tags subcommands.
Tag Indexing
Two kinds of frontmatter fields are indexed:
-
List fields — each item becomes a tag as-is.
tags: - tools:docker - tools:pythonProduces tags:
tools:docker,tools:python. -
Scalar fields — indexed as
key:value(colon separator).level: beginner category: big-data duration_hours: 24Produces tags:
level:beginner,category:big-data,duration_hours:24.
Both inline YAML lists (tags: [a, b, c]) and multi-line lists are supported.
The tags_dir Allowlist
The tags_dir directory (default: tags/) contains .txt files that
define the allowed tags. Each file <name>.txt contributes tags as
<name>:<line> pairs. For example:
tags/
├── level.txt # Contains: beginner, intermediate, advanced
├── languages.txt # Contains: python, rust, go, ...
├── tools.txt # Contains: docker, ansible, ...
└── audiences.txt # Contains: developers, architects, ...
level.txt with content beginner produces the allowed tag level:beginner.
The tags processor is only auto-detected when tags_dir contains .txt files.
Build-Time Validation
During every build, the tags processor runs the following checks. Any failure stops the build with a descriptive error message.
Required Frontmatter Fields
When required_fields is configured, every .md file must contain those
frontmatter fields. Empty lists ([]) and empty strings are treated as missing.
Files with no frontmatter block at all also fail:
[processor.tags]
required_fields = ["tags", "level", "category", "duration_hours", "audiences"]
Missing required frontmatter fields:
syllabi/courses/intro.md: category, duration_hours
syllabi/courses/advanced.md: audiences
Required Field Groups
When required_field_groups is configured, every file must satisfy at least
one group (all fields in that group present). This handles cases where files
may have alternative sets of fields:
[processor.tags]
required_field_groups = [
["duration_hours"],
["duration_hours_long", "duration_hours_short"],
]
A file with duration_hours passes. A file with both duration_hours_long and
duration_hours_short passes. A file with only duration_hours_short (partial
group) or none of these fields fails:
Files missing required field groups (must satisfy at least one):
syllabi/courses/intro.md: none of [duration_hours] or [duration_hours_long, duration_hours_short]
Required Values
When required_values is configured, scalar fields must contain a value that
exists in the corresponding tags/<field>.txt file. This catches typos in
scalar values:
[processor.tags]
required_values = ["level", "category"]
Invalid values for validated fields:
syllabi/courses/intro.md: level=begginer (not in tags/level.txt)
Field Types
When field_types is configured, frontmatter fields must have the expected
type. Supported types: "list", "scalar", "number".
[processor.tags.field_types]
tags = "list"
level = "scalar"
duration_hours = "number"
Field type mismatches:
syllabi/courses/intro.md: 'level' expected list, got scalar
Unique Fields
When unique_fields is configured, no two files may share the same value for
that field:
[processor.tags]
unique_fields = ["title"]
Duplicate values for unique fields:
title='Intro to Docker' in:
- syllabi/courses/docker_intro.md
- syllabi/courses/containers/docker_intro.md
Sorted Tags
When sorted_tags = true, list-type frontmatter fields must have their items
in lexicographic sorted order. This reduces diff noise in version control:
[processor.tags]
sorted_tags = true
List tags are not in sorted order:
syllabi/courses/intro.md field 'tags': 'tools:alpha' should come after 'tools:beta'
Duplicate Tags Within a File
The same tag cannot appear twice in a single file’s frontmatter:
Duplicate tags found within files:
tools:docker in syllabi/courses/containers/intro.md
Duplicate Tags Across Tag Lists
The same category:value tag cannot be defined in multiple tags_dir/*.txt
files. Note that the same value in different categories is fine (tools:docker
and infra:docker are distinct tags):
Duplicate tags found across tags files:
tools:docker in tools.txt and infra.txt
Unknown Tags
Every tag found in frontmatter must exist in tags_dir. Unknown tags cause an
error with a typo suggestion (Levenshtein distance):
Unknown tags found (not in tags):
tools:dockker (did you mean 'tools:docker'?)
- syllabi/courses/containers/intro.md
Unused Tags
Every tag defined in tags_dir/*.txt must be used by at least one .md file.
This catches stale entries that should be cleaned up:
Unused tags in tags (not used by any file):
tools:vagrant
languages:fortran
Source Files
- Input:
**/*.md(configurable viasrc_dirs/src_extensions) - Output:
out/tags/tags.db
Configuration
[processor.tags]
output = "out/tags/tags.db" # Output database path
tags_dir = "tags" # Directory containing tag list files
required_fields = ["tags", "level", "category"] # Fields every .md file must have
required_field_groups = [ # At least one group must be fully present
["duration_hours"],
["duration_hours_long", "duration_hours_short"],
]
required_values = ["level", "category"] # Scalar fields validated against tags
unique_fields = ["title"] # Fields that must be unique across files
sorted_tags = true # Require list items in sorted order
dep_inputs = [] # Additional files that trigger rebuilds
[processor.tags.field_types]
tags = "list" # Must be a YAML list
level = "scalar" # Must be a string
duration_hours = "number" # Must be numeric
| Key | Type | Default | Description |
|---|---|---|---|
output | string | "out/tags/tags.db" | Path to the tags database file |
tags_dir | string | "tags" | Directory containing .txt tag list files |
required_fields | string[] | [] | Frontmatter fields that every .md file must have |
required_field_groups | string[][] | [] | Alternative field groups; at least one group must be fully present |
required_values | string[] | [] | Scalar fields whose values must exist in tags/<field>.txt |
unique_fields | string[] | [] | Fields whose values must be unique across all files |
field_types | map | {} | Expected types per field: "list", "scalar", or "number" |
sorted_tags | bool | false | Require list items in sorted order within each file |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
Subcommands
All subcommands require a prior rsconstruct build to populate the database
(except check which reads files directly).
All support --json for machine-readable output.
Querying
| Command | Description |
|---|---|
rsconstruct tags list | List all unique tags (sorted) |
rsconstruct tags files TAG [TAG...] | List files matching all given tags (AND) |
rsconstruct tags files --or TAG [TAG...] | List files matching any given tag (OR) |
rsconstruct tags grep TEXT | Search for tags containing a substring |
rsconstruct tags grep -i TEXT | Case-insensitive tag search |
rsconstruct tags for-file PATH | List all tags for a specific file (supports suffix matching) |
rsconstruct tags frontmatter PATH | Show raw parsed frontmatter for a file |
rsconstruct tags count | Show each tag with its file count, sorted by frequency |
rsconstruct tags tree | Show tags grouped by key (e.g. level= group) vs bare tags |
rsconstruct tags stats | Show database statistics (file count, unique tags, associations) |
Reporting
| Command | Description |
|---|---|
rsconstruct tags matrix | Show a coverage matrix of tag categories per file |
rsconstruct tags coverage | Show percentage of files that have each tag category |
rsconstruct tags orphans | Find files with no tags at all |
rsconstruct tags suggest PATH | Suggest tags for a file based on similarity to other tagged files |
Validation
| Command | Description |
|---|---|
rsconstruct tags check | Run all validations without building (fast lint pass) |
rsconstruct tags unused | List tags in tags_dir that no file uses |
rsconstruct tags unused --strict | Same, but exit with error if any unused tags exist (for CI) |
rsconstruct tags validate | Validate indexed tags against tags_dir without rebuilding |
Terms Processor
Purpose
Checks that technical terms from a terms directory are backtick-quoted in Markdown files, and provides commands to auto-fix and merge term lists across projects.
How It Works
Loads terms from terms/*.txt files (one term per line, organized by category).
For each .md file, simulates what rsconstruct terms fix would produce. If the
result differs from the current content, the product fails.
The processor skips YAML frontmatter and fenced code blocks. Terms are matched case-insensitively with word-boundary detection, longest-first to avoid partial matches (e.g., “Android Studio” matches before “Android”).
Auto-detected when a terms/ directory exists and .md files are present.
Source Files
- Input:
**/*.md - Output: none (checker)
Configuration
[processor.terms]
terms_dir = "terms" # Directory containing term list .txt files
batch = true # Enable batch execution
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
terms_dir | string | "terms" | Directory containing .txt term list files |
batch | bool | true | Enable batch execution |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Term List Format
Each .txt file in the terms directory contains one term per line. Files are
typically organized by category:
terms/
programming_languages.txt
frameworks_and_libraries.txt
databases_and_storage.txt
devops_and_cicd.txt
...
Example programming_languages.txt:
Python
JavaScript
TypeScript
Rust
C++
Go
Commands
rsconstruct terms fix
Add backticks around unquoted terms in all markdown files.
rsconstruct terms fix
rsconstruct terms fix --remove-non-terms # also remove backticks from non-terms
The fix is idempotent: running it twice produces the same result.
rsconstruct terms merge <path>
Merge terms from another project’s terms directory into the current one. For matching filenames, new terms are added (union). Missing files are copied in both directions.
rsconstruct terms merge ../other-project/terms
Taplo Processor
Purpose
Checks TOML files using taplo.
How It Works
Discovers .toml files in the project (excluding common build tool
directories), runs taplo check on each file, and records success in the cache.
A non-zero exit code from taplo fails the product.
This processor supports batch mode, allowing multiple files to be checked in a single taplo invocation for better performance.
Source Files
- Input:
**/*.toml - Output: none (checker)
Configuration
[processor.taplo]
command = "taplo" # The taplo command to run
args = [] # Additional arguments to pass to taplo
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "taplo" | The taplo executable to run |
args | string[] | [] | Extra arguments passed to taplo |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Tera Processor
Purpose
Renders Tera template files into output files, with support for loading configuration variables from Python or Lua files.
How It Works
Files matching configured extensions in tera.templates/ are rendered and written
to the project root with the extension stripped:
tera.templates/app.config.tera → app.config
tera.templates/sub/readme.txt.tera → sub/readme.txt
Templates use the Tera templating engine and can call
load_python(path="...") or load_lua(path="...") to load variables from config files.
Loading Lua config
{% set config = load_lua(path="config/settings.lua") %}
[app]
name = "{{ config.project_name }}"
version = "{{ config.version }}"
Lua configs are executed via the embedded Lua 5.4 interpreter (no external
dependency). All user-defined globals (strings, numbers, booleans, tables) are
exported. Built-in Lua globals and functions are automatically filtered out.
dofile() and require() work relative to the config file’s directory.
Loading Python config
{% set config = load_python(path="config/settings.py") %}
[app]
name = "{{ config.project_name }}"
version = "{{ config.version }}"
Source Files
- Input:
tera.templates/**/*{src_extensions} - Output: project root, mirroring the template path with the extension removed
Configuration
[processor.tera]
src_extensions = [".tera"] # File extensions to process (default: [".tera"])
dep_inputs = ["config/settings.py"] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
src_extensions | string[] | [".tera"] | File extensions to discover |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
Each input file is processed individually, producing its own output file.
Tidy Processor
Purpose
Validates HTML files using HTML Tidy.
How It Works
Discovers .html and .htm files in the project (excluding common build tool
directories), runs tidy -errors on each file, and records success in the cache.
A non-zero exit code from tidy fails the product.
This processor supports batch mode.
Source Files
- Input:
**/*.html,**/*.htm - Output: none (checker)
Configuration
[processor.tidy]
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to tidy |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
XMLLint Processor
Purpose
Validates XML files using xmllint.
How It Works
Discovers .xml files in the project (excluding common build tool directories),
runs xmllint --noout on each file, and records success in the cache. A non-zero
exit code from xmllint fails the product.
This processor supports batch mode.
Source Files
- Input:
**/*.xml - Output: none (checker)
Configuration
[processor.xmllint]
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to xmllint |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Yaml2json Processor
Purpose
Converts YAML files to JSON. Native (in-process, no external tools required).
How It Works
Discovers YAML files in the configured directories and converts each to a pretty-printed JSON file.
Source Files
- Input:
**/*.yml,**/*.yaml - Output:
out/yaml2json/{relative_path}.json
Configuration
[processor.yaml2json]
src_dirs = ["yaml"]
output_dir = "out/yaml2json" # Output directory (default)
| Key | Type | Default | Description |
|---|---|---|---|
output_dir | string | "out/yaml2json" | Output directory for JSON files |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch Support
Each input file is processed individually, producing its own output file.
Yamllint Processor
Purpose
Lints YAML files using yamllint.
How It Works
Discovers .yml and .yaml files in the project (excluding common build tool
directories), runs yamllint on each file, and records success in the cache.
A non-zero exit code from yamllint fails the product.
This processor supports batch mode, allowing multiple files to be checked in a single yamllint invocation for better performance.
Source Files
- Input:
**/*.yml,**/*.yaml - Output: none (checker)
Configuration
[processor.yamllint]
command = "yamllint" # The yamllint command to run
args = [] # Additional arguments to pass to yamllint
dep_inputs = [] # Additional files that trigger rebuilds when changed
| Key | Type | Default | Description |
|---|---|---|---|
command | string | "yamllint" | The yamllint executable to run |
args | string[] | [] | Extra arguments passed to yamllint |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
Yq Processor
Purpose
Validates YAML files using yq.
How It Works
Discovers .yml and .yaml files in the project (excluding common build tool
directories), runs yq . on each file to validate syntax, and records success
in the cache. A non-zero exit code from yq fails the product.
This processor supports batch mode.
Source Files
- Input:
**/*.yml,**/*.yaml - Output: none (checker)
Configuration
[processor.yq]
args = []
dep_inputs = []
| Key | Type | Default | Description |
|---|---|---|---|
args | string[] | [] | Extra arguments passed to yq |
dep_inputs | string[] | [] | Extra files whose changes trigger rebuilds |
Batch support
The tool accepts multiple files on the command line. When batching is enabled (default), rsconstruct passes all files in a single invocation for better performance.
GitHub Actions
How to run rsconstruct in a GitHub Actions workflow.
Recommended flags
- name: Build
run: rsconstruct build -q -j0
| Flag | Why |
|---|---|
-q (quiet) | Suppresses the progress bar and status messages. The progress bar uses terminal escape codes that produce garbage in CI logs. Only errors are shown. |
-j0 | Auto-detect CPU cores. GitHub-hosted runners have 4 cores (ubuntu-latest) — using them all speeds up the build significantly vs the default of -j1. |
Full workflow example
name: Build
on: [push, pull_request]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install rsconstruct
run: cargo install rsconstruct
- name: Install tools
run: rsconstruct tools install --yes
- name: Build
run: rsconstruct build -q -j0
Runner sizing
| Runner | Cores | RAM | Notes |
|---|---|---|---|
ubuntu-latest | 4 | 16 GB | Good for most projects. Use -j0 or -j4. |
ubuntu-latest (private repo) | 4 | 16 GB | Same hardware as public repos. |
| Large runners | 8-64 | 32-256 GB | For large projects. -j0 scales automatically. |
-j0 always does the right thing — it detects the available cores at runtime.
There is no benefit to setting -j higher than the core count.
Caching
Cache the .rsconstruct/ directory between runs to skip unchanged products:
- uses: actions/cache@v4
with:
path: .rsconstruct
key: rsconstruct-${{ hashFiles('rsconstruct.toml') }}-${{ github.sha }}
restore-keys: |
rsconstruct-${{ hashFiles('rsconstruct.toml') }}-
rsconstruct-
This restores cached build products from previous runs. Only products whose inputs changed will be rebuilt.
Tips
- Don’t use
--timingsin CI unless you need the data. It adds overhead. - Use
--jsoninstead of-qif you want machine-readable output for downstream processing. - Use
-k(keep-going) to see all failures at once instead of stopping at the first one. - Use
--verify-tool-versionsto catch tool version drift between local and CI environments.
Lua Plugins
RSConstruct supports custom processors written in Lua. Drop a .lua file in the plugins/ directory and add a [processor.NAME] section in rsconstruct.toml. The plugin participates in discovery, execution, caching, cleaning, tool listing, and auto-detection just like a built-in processor.
Quick Start
1. Create the plugin file:
plugins/eslint.lua
function description()
return "Lint JavaScript/TypeScript with ESLint"
end
function required_tools()
return {"eslint"}
end
function discover(project_root, config, files)
local products = {}
for _, file in ipairs(files) do
local stub = rsconstruct.stub_path(project_root, file, "eslint")
table.insert(products, {
inputs = {file},
outputs = {stub},
})
end
return products
end
function execute(product)
rsconstruct.run_command("eslint", {product.inputs[1]})
rsconstruct.write_stub(product.outputs[1], "linted")
end
2. Enable it in rsconstruct.toml:
[processor.eslint]
src_dirs = ["src"]
src_extensions = [".js", ".ts"]
3. Run it:
rsconstruct build # builds including the plugin
rsconstruct processors list # shows the plugin
rsconstruct processors files # shows files discovered by the plugin
Lua API Contract
Each .lua file defines global functions. Three are required; the rest have sensible defaults.
Required Functions
description()
Returns a human-readable string describing what the processor does. Called once when the plugin is loaded.
function description()
return "Lint JavaScript files with ESLint"
end
discover(project_root, config, files)
Called during product discovery. Receives:
project_root(string) — absolute path to the project rootconfig(table) — the[processor.NAME]TOML section as a Lua tablefiles(table) — list of absolute file paths matching the scan configuration
Must return a table of products. Each product is a table with inputs and outputs keys, both containing tables of absolute file paths.
function discover(project_root, config, files)
local products = {}
for _, file in ipairs(files) do
local stub = rsconstruct.stub_path(project_root, file, "myplugin")
table.insert(products, {
inputs = {file},
outputs = {stub},
})
end
return products
end
execute(product)
Called to build a single product. Receives a table with inputs and outputs keys (both tables of absolute path strings). Must create the output files on success or error on failure.
function execute(product)
rsconstruct.run_command("mytool", {product.inputs[1]})
rsconstruct.write_stub(product.outputs[1], "done")
end
Optional Functions
clean(product)
Called when running rsconstruct clean. Receives the same product table as execute(). Default behavior: removes all output files.
function clean(product)
for _, output in ipairs(product.outputs) do
rsconstruct.remove_file(output)
end
end
auto_detect(files)
Called to determine whether this processor is relevant for the project (when auto_detect = true in config). Receives the list of matching files. Default: returns true if the files list is non-empty.
function auto_detect(files)
return #files > 0
end
required_tools()
Returns a table of external tool names required by this processor. Used by rsconstruct tools list and rsconstruct tools check. Default: empty table.
function required_tools()
return {"eslint", "node"}
end
processor_type()
Returns the type of processor: "generator" or "checker". Generators create real output files (e.g., compilers, transpilers). Checkers validate input files; for checkers, you can choose whether to produce stub files or not. Default: "checker".
Option 1: Checker with stub files (for Lua plugins)
function processor_type()
return "checker"
end
When using stub files, return outputs = {stub} from discover() and call rsconstruct.write_stub() in execute().
Option 2: Checker without stub files
function processor_type()
return "checker"
end
Return outputs = {} from discover() and don’t write stubs in execute(). The cache database entry itself serves as the success record.
The rsconstruct Global Table
Lua plugins have access to an rsconstruct global table with helper functions.
| Function | Description |
|---|---|
rsconstruct.stub_path(project_root, source, suffix) | Compute the stub output path for a source file. Maps project_root/a/b/file.ext to out/suffix/a_b_file.ext.suffix. |
rsconstruct.run_command(program, args) | Run an external command. Errors if the command fails (non-zero exit). |
rsconstruct.run_command_cwd(program, args, cwd) | Run an external command with a working directory. |
rsconstruct.write_stub(path, content) | Write a stub file (creates parent directories as needed). |
rsconstruct.remove_file(path) | Remove a file if it exists. No error if the file is missing. |
rsconstruct.file_exists(path) | Returns true if the file exists. |
rsconstruct.read_file(path) | Read a file and return its contents as a string. |
rsconstruct.path_join(parts) | Join path components. Takes a table: rsconstruct.path_join({"a", "b", "c"}) returns "a/b/c". |
rsconstruct.log(message) | Print a message prefixed with the plugin name. |
Configuration
Plugins use the standard scan configuration fields. Any [processor.NAME] section in rsconstruct.toml is passed to the plugin’s discover() function as the config table.
Scan Configuration
These fields control which files are passed to discover():
| Key | Type | Default | Description |
|---|---|---|---|
src_dirs | string[] | [""] | Directory to scan ("" = project root) |
src_extensions | string[] | [] | File extensions to match |
src_exclude_dirs | string[] | [] | Directory path segments to skip |
src_exclude_files | string[] | [] | File names to skip |
src_exclude_paths | string[] | [] | Paths relative to project root to skip |
Custom Configuration
Any additional keys in the [processor.NAME] section are passed through to the Lua config table:
[processor.eslint]
src_dirs = ["src"]
src_extensions = [".js", ".ts"]
max_warnings = 0 # custom key, accessible as config.max_warnings in Lua
fix = false # custom key, accessible as config.fix in Lua
function execute(product)
local args = {product.inputs[1]}
if config.max_warnings then
table.insert(args, "--max-warnings")
table.insert(args, tostring(config.max_warnings))
end
rsconstruct.run_command("eslint", args)
rsconstruct.write_stub(product.outputs[1], "linted")
end
Plugins Directory
The directory where RSConstruct looks for .lua files is configurable:
[plugins]
dir = "plugins" # default
Plugin Name Resolution
The plugin name is derived from the .lua filename (without extension). This name is used for:
- The
[processor.NAME]config section - The
[processor.NAME]config section inrsconstruct.toml - The
out/NAME/stub directory - Display in
rsconstruct processors listand build output
A plugin name must not conflict with a built-in processor name (tera, ruff, pylint, cc_single_file, cppcheck, shellcheck, zspell, make). RSConstruct will error if a conflict is detected.
Incremental Builds
Lua plugins participate in RSConstruct’s incremental build system automatically:
- Products are identified by their inputs, outputs, and a config hash
- If none of the declared inputs have changed since the last build, the product is skipped
- If the
[processor.NAME]config section changes, all products are rebuilt - Outputs are cached and can be restored from cache
For correct incrementality, make sure discover() declares all files that affect the output. If your tool reads additional configuration files, include them in the inputs list.
Examples
Linter Without Stub Files (Recommended)
A checker that validates files without producing stub files. Success is recorded in the cache database.
function description()
return "Lint YAML files with yamllint"
end
function processor_type()
return "checker"
end
function required_tools()
return {"yamllint"}
end
function discover(project_root, config, files)
local products = {}
for _, file in ipairs(files) do
table.insert(products, {
inputs = {file},
outputs = {}, -- No output files
})
end
return products
end
function execute(product)
rsconstruct.run_command("yamllint", {"-s", product.inputs[1]})
-- No stub to write; cache entry = success
end
function clean(product)
-- Nothing to clean
end
[processor.yamllint]
src_extensions = [".yml", ".yaml"]
Stub-Based Linter (Legacy)
A linter that creates stub files. Use this if you need the stub file for some reason.
function description()
return "Lint YAML files with yamllint"
end
function processor_type()
return "checker"
end
function required_tools()
return {"yamllint"}
end
function discover(project_root, config, files)
local products = {}
for _, file in ipairs(files) do
table.insert(products, {
inputs = {file},
outputs = {rsconstruct.stub_path(project_root, file, "yamllint")},
})
end
return products
end
function execute(product)
rsconstruct.run_command("yamllint", {"-s", product.inputs[1]})
rsconstruct.write_stub(product.outputs[1], "linted")
end
[processor.yamllint]
src_extensions = [".yml", ".yaml"]
File Transformer (Generator)
A plugin that transforms input files into output files (not stubs). This is a “generator” processor.
function description()
return "Compile Sass to CSS"
end
function processor_type()
return "generator"
end
function required_tools()
return {"sass"}
end
function discover(project_root, config, files)
local products = {}
for _, file in ipairs(files) do
local out = file:gsub("%.scss$", ".css"):gsub("^" .. project_root .. "/src/", project_root .. "/out/sass/")
table.insert(products, {
inputs = {file},
outputs = {out},
})
end
return products
end
function execute(product)
rsconstruct.run_command("sass", {product.inputs[1], product.outputs[1]})
end
[processor.sass]
src_dirs = ["src"]
src_extensions = [".scss"]
Advanced Usage
Parallel builds
RSConstruct can build independent products concurrently. Set the number of parallel jobs:
rsconstruct build -j4 # 4 parallel jobs
rsconstruct build -j0 # Auto-detect CPU cores
Or configure it in rsconstruct.toml:
[build]
parallel = 4 # 0 = auto-detect
The -j flag on the command line overrides the config file setting.
Watch mode
Watch source files and automatically rebuild on changes:
rsconstruct watch
This monitors all source files and triggers an incremental build whenever a file is modified.
Dependency graph
Visualize the build dependency graph in multiple formats:
rsconstruct graph # Default text format
rsconstruct graph --format dot # Graphviz DOT format
rsconstruct graph --format mermaid # Mermaid diagram format
rsconstruct graph --format json # JSON format
rsconstruct graph --view # Open in browser or viewer
The --view flag opens the graph using the configured viewer (set in rsconstruct.toml):
[graph]
viewer = "google-chrome"
Ignoring files
RSConstruct respects .gitignore files automatically. Any file ignored by git is also ignored by all processors. Nested .gitignore files and negation patterns are supported.
For project-specific exclusions that should not go in .gitignore, create a .rsconstructignore file in the project root with glob patterns (one per line):
/src/experiments/**
*.bak
The syntax is the same as .gitignore: # for comments, / prefix to anchor to the project root, / suffix for directories, and */** for globs.
Processor verbosity levels
Control the detail level of build output with -v N:
| Level | Output |
|---|---|
| 0 (default) | Target basename only: main.elf |
| 1 | Target path: out/cc_single_file/main.elf; cc_single_file processor also prints compiler commands |
| 2 | Adds source path: out/cc_single_file/main.elf <- src/main.c |
| 3 | Adds all inputs: out/cc_single_file/main.elf <- src/main.c, src/utils.h |
Dry run
Preview what would be built without executing anything:
rsconstruct build --dry-run
Keep going after errors
By default, RSConstruct stops on the first error. Use --keep-going to continue building other products:
rsconstruct build --keep-going
Build timings
Show per-product and total timing information:
rsconstruct build --timings
Shell completions
Generate shell completions for your shell:
rsconstruct complete bash # Bash completions
rsconstruct complete zsh # Zsh completions
rsconstruct complete fish # Fish completions
Configure which shells to generate completions for:
[completions]
shells = ["bash"]
Extra inputs
By default, each processor only tracks its primary source files as inputs. If a product depends on additional files that aren’t automatically discovered (e.g., a config file read by a linter, a suppressions file used by a static analyzer, or a Python settings file loaded by a template), you can declare them with dep_inputs.
When any file listed in dep_inputs changes, all products from that processor are rebuilt.
[processor.template]
dep_inputs = ["config/settings.py", "config/database.py"]
[processor.ruff]
dep_inputs = ["pyproject.toml"]
[processor.pylint]
dep_inputs = ["pyproject.toml"]
[processor.cppcheck]
dep_inputs = [".cppcheck-suppressions"]
[processor.cc_single_file]
dep_inputs = ["Makefile.inc"]
[processor.zspell]
dep_inputs = ["custom-dictionary.txt"]
Paths are relative to the project root. Missing files cause a build error, so all listed files must exist.
The dep_inputs paths are included in the processor’s config hash, so adding or removing entries triggers a rebuild even if the files themselves haven’t changed. The file contents are also checksummed as part of the product’s input set, so any content change is detected by the incremental build system.
All processors support dep_inputs.
Graceful interrupt
Pressing Ctrl+C during a build stops execution promptly:
- Subprocess termination — All external processes (compilers, linters, etc.) are spawned with a poll loop that checks for interrupts every 50ms. When Ctrl+C is detected, the running child process is killed immediately rather than waiting for it to finish. This keeps response time under 50ms regardless of how long the subprocess would otherwise run.
- Progress preservation — Products that completed successfully before the interrupt are cached. The next build resumes from where it left off rather than starting over.
- Parallel builds — In parallel mode, all in-flight subprocesses are killed when Ctrl+C is detected. Each thread’s poll loop independently checks the global interrupt flag.
Environment Variables
The problem
Build tools that inherit the user’s environment variables produce non-deterministic builds. Consider a C compiler invoked by a build tool:
- If the user has
CFLAGS=-O2in their shell, the build produces optimized output. - If they unset it, the build produces debug output.
- Two developers on the same project get different results from the same source files.
This breaks caching (the cache key doesn’t account for env vars), breaks reproducibility (builds differ across machines), and makes debugging harder (a build failure may depend on an env var the developer forgot they set).
Common examples of environment variables that silently affect build output:
| Variable | Effect |
|---|---|
CC, CXX | Changes which compiler is used |
CFLAGS, CXXFLAGS, LDFLAGS | Changes compiler/linker flags |
PATH | Changes which tool versions are found |
PYTHONPATH | Changes Python module resolution |
LANG, LC_ALL | Changes locale-dependent output (sorting, error messages) |
HOME | Changes where config files are read from |
RSConstruct’s approach
RSConstruct does not use environment variables from the user’s environment to control build behavior. All configuration comes from explicit, versioned sources:
rsconstruct.toml— all processor configuration (compiler flags, linter args, scan dirs, etc.)- Source file directives — per-file flags embedded in comments (e.g.,
// EXTRA_COMPILE_FLAGS_BEFORE=-pthread) - Tool lock file —
.tools.versionslocks tool versions so changes are detected
This means:
- The same source tree always produces the same build, regardless of the user’s shell environment.
- Cache keys are computed from file contents and config values, not ambient env vars.
- Remote cache sharing works because two machines with different environments still produce identical cache keys for identical inputs.
Rules for processor authors
When implementing a processor (built-in or Lua plugin):
-
Never read
std::env::var()to determine build behavior. If a value is configurable, add it to the processor’s config struct inrsconstruct.toml. -
Never call
cmd.env()to pass environment variables to external tools, unless the variable is derived from explicit config (not fromstd::env). The user’s environment is inherited by default — the goal is to avoid adding env-based configuration on top. -
Tool paths come from
PATH— RSConstruct does inherit the user’sPATHto find tools likegcc,ruff, etc. This is acceptable because the tool lock file (.tools.versions) detects when tool versions change and triggers rebuilds. Usersconstruct tools lockto pin versions. -
Config values, not env vars — if a tool needs a flag that varies per project, put it in
rsconstruct.tomlunder the processor’s config section. Config values are hashed into cache keys automatically.
What RSConstruct does inherit
RSConstruct inherits the full parent environment for subprocess execution. This is unavoidable — tools need PATH to be found, HOME to read their own config files, etc. The key design decision is that RSConstruct itself never reads env vars to make build decisions, and processors never add env vars derived from the user’s environment.
The exceptions are:
NO_COLOR— RSConstruct respects this standard env var to disable colored output, which is a display concern and does not affect build output.RSCONSTRUCT_THREADS— Sets the number of parallel jobs (equivalent to-j). Priority: CLI-jflag >RSCONSTRUCT_THREADSenv var >[build] parallelconfig. This is a performance tuning concern and does not affect build correctness or output.
Internal Documentation
This section collects documentation aimed at rsconstruct’s contributors and maintainers — people who modify the codebase itself, not end users who configure rsconstruct for their projects.
If you are using rsconstruct to build a project, you can stop reading now. Everything below is about how rsconstruct works internally: data structures, design decisions, invariants, coding style, and the reasoning behind non-obvious choices.
What belongs here
A chapter belongs in “For Maintainers” if it answers at least one of these questions:
- How is rsconstruct implemented? (Architecture, cache layout, execution model)
- Why did we make this design choice? (Design notes, rejected alternatives, tradeoffs)
- What contract must my code uphold? (Processor contract, invariants, coding standards)
- What’s the right way to extend rsconstruct? (Adding processors, adding analyzers)
- What’s the non-obvious implementation detail I need to know? (Checksum cache layers, descriptor keys, shared-output-directory semantics)
A chapter does NOT belong here if it answers:
- How do I install rsconstruct?
- How do I configure a processor for my project?
- How do I use processor X on file type Y?
Those are user-facing and live in the main section above.
How to use this section
Read in roughly this order if you’re new to the codebase:
- Architecture — 10-minute tour of the major modules and their responsibilities.
- Coding Standards — conventions you’ll be held to in code review.
- Strictness — how the compiler is configured to reject lax code, and the rules for opting out.
- Processor Contract — the interface every processor must satisfy. Read before adding a new processor.
- Testing — how the test suite is structured and how to add new tests.
- Cache System and Checksum Cache — how incremental builds actually work.
After that, read topic-specific chapters as the work demands:
- Building cache features → Cache System, Processor Versioning
- Adding a processor that writes into a shared directory → Shared Output Directory
- Adding cross-processor dependencies → Cross-Processor Dependencies
- Thinking about ordering and enumeration → Processor Ordering, Output Prediction
Links to individual chapters
See the table of contents in the sidebar. Brief one-line summaries:
- Architecture — module map and major data flows.
- Design Notes — collected rationale for design decisions.
- Coding Standards — naming, file layout, error handling conventions.
- Strictness — crate-level
#![deny(warnings)], rules for#[allow]. - Testing — integration test structure and philosophy.
- Parameter Naming — canonical names for the same concept in different places.
- Processor Contract — what every processor must implement and uphold.
- Cache System — content-addressed object store, descriptor keys.
- Checksum Cache — mtime-based content hash caching.
- Dependency Caching — caching of source-file dependency scans (e.g. C/C++ headers).
- Processor Versioning — how processors invalidate caches when their behavior changes.
- Cross-Processor Dependencies — how one processor’s outputs become another’s inputs.
- Shared Output Directory — handling multiple processors that write into the same folder.
- Processor Ordering — why rsconstruct does NOT have explicit ordering primitives.
- Output Prediction — the MassGenerator design: tools that enumerate their outputs in advance.
- Per-Processor Statistics — why cache stats can’t group by processor today, options for fixing it.
- Profiling — recorded profiling runs with date + rsconstruct version, plus how-to for rerunning.
- Unreferenced Files — detecting files on disk that no product references.
- Internal Processors — pure-Rust processors that do not shell out.
- Missing Processors — tools we don’t yet wrap but should.
- Crates.io Publishing — release process.
- Per-Processor max_jobs — design note for per-processor parallelism limits.
- Rejected Audit Findings — audit issues deliberately rejected, kept to prevent re-flagging.
- Suggestions — ideas for future work.
- Suggestions Done — archive of completed suggestions.
- TODO — ongoing and completed task list.
Architecture
This page describes RSConstruct’s internal design for contributors and those interested in how the tool works.
Core concepts
Processors
Processors implement the ProductDiscovery trait. Each processor:
- Auto-detects whether it is relevant for the current project
- Scans the project for source files matching its conventions
- Creates products describing what to build
- Executes the build for each product
Run rsconstruct processors list to see all available processors and their auto-detection results.
Auto-detection
Every processor implements auto_detect(), which returns true if the processor appears relevant for the current project based on filesystem heuristics. This allows RSConstruct to guess which processors a project needs without requiring manual configuration.
The ProductDiscovery trait requires four methods:
| Method | Purpose |
|---|---|
auto_detect(file_index) | Return true if the project looks like it needs this processor |
discover(graph, file_index) | Query the file index and add products to the build graph |
execute(product) | Build a single product |
clean(product) | Remove a product’s outputs |
Both auto_detect and discover receive a &FileIndex — a pre-built index of all non-ignored files in the project (see File indexing below).
Detection heuristics per processor:
| Processor | Type | Detected when |
|---|---|---|
tera | Generator | templates/ directory contains files matching configured extensions |
ruff | Checker | Project contains .py files |
pylint | Checker | Project contains .py files |
mypy | Checker | Project contains .py files |
pyrefly | Checker | Project contains .py files |
cc_single_file | Generator | Configured source directory contains .c or .cc files |
cppcheck | Checker | Configured source directory contains .c or .cc files |
clang_tidy | Checker | Configured source directory contains .c or .cc files |
shellcheck | Checker | Project contains .sh or .bash files |
zspell | Checker | Project contains files matching configured extensions (e.g., .md) |
aspell | Checker | Project contains .md files |
ascii | Checker | Project contains .md files |
rumdl | Checker | Project contains .md files |
mdl | Checker | Project contains .md files |
markdownlint | Checker | Project contains .md files |
make | Checker | Project contains Makefile files |
cargo | Mass Generator | Project contains Cargo.toml files |
sphinx | Mass Generator | Project contains conf.py files |
mdbook | Mass Generator | Project contains book.toml files |
yamllint | Checker | Project contains .yml or .yaml files |
jq | Checker | Project contains .json files |
jsonlint | Checker | Project contains .json files |
json_schema | Checker | Project contains .json files |
taplo | Checker | Project contains .toml files |
pip | Mass Generator | Project contains requirements.txt files |
npm | Mass Generator | Project contains package.json files |
gem | Mass Generator | Project contains Gemfile files |
pandoc | Generator | Project contains .md files |
markdown2html | Generator | Project contains .md files |
marp | Generator | Project contains .md files |
mermaid | Generator | Project contains .mmd files |
drawio | Generator | Project contains .drawio files |
a2x | Generator | Project contains .txt (AsciiDoc) files |
pdflatex | Generator | Project contains .tex files |
libreoffice | Generator | Project contains .odp files |
pdfunite | Generator | Source directory contains subdirectories with PDF-source files |
iyamlschema | Checker | Project contains .yml or .yaml files |
yaml2json | Generator | Project contains .yml or .yaml files |
imarkdown2html | Generator | Project contains .md files |
tags | Generator | Project contains .md files with YAML frontmatter |
Run rsconstruct processors list to see the auto-detection results for the current project.
Products
A product represents a single build unit with:
- Inputs — source files that the product depends on
- Outputs — files that the product generates
- Output directory (optional) — for creators, the directory whose entire contents are cached and restored as a unit
BuildGraph
The BuildGraph manages dependencies between products. It performs a topological sort to determine the correct build order, ensuring that dependencies are built before the products that depend on them.
Executor
The executor runs products in dependency order. It supports:
- Sequential execution (default)
- Parallel execution of independent products (with
-jflag) - Dry-run mode (show what would be built)
- Keep-going mode (continue after errors)
- Batch execution (group multiple products into one tool invocation)
Incremental rebuild after partial failure
Each product is cached independently after successful execution. If a build is interrupted or fails partway through, the next run only rebuilds products that don’t have valid cache entries:
-
Non-batch mode (default fail-fast,
chunk_size=1): Each product executes and is cached individually. If the build stops after 400 of 800 products, the next run skips the 400 cached successes and rebuilds the remaining 400. -
Batch mode with external tools (
--keep-goingor explicit--batch-size): The external tool receives all files in the batch in one invocation. If the tool exits with an error, all products in that batch are marked failed — there is no way to determine which outputs are valid from a single exit code. On the next run, all products from the failed batch are rebuilt. -
Batch mode with internal processors (e.g.,
imarkdown2html,isass,ipdfunite): These process files sequentially in-process and return per-file results, so partial failure is handled correctly even in batch mode — only the failed products are rebuilt.
Interrupt handling
All external subprocess execution goes through run_command() in src/processors/mod.rs. Instead of calling Command::output() (which blocks until the process finishes), run_command() uses Command::spawn() followed by a poll loop:
- Spawn the child process with piped stdout/stderr
- Every 50ms, call
try_wait()to check if the process has exited - Between polls, check the global
INTERRUPTEDflag (set by the Ctrl+C handler) - If interrupted, kill the child process immediately and return an error
This ensures that pressing Ctrl+C terminates running subprocesses within 50ms, even for long-running compilations or linter invocations.
The global INTERRUPTED flag is an AtomicBool set once by the ctrlc handler in main.rs and checked by all threads.
File indexing
RSConstruct walks the project tree once at startup and builds a FileIndex — a sorted list of all non-ignored files. The walk is performed by the ignore crate (ignore::WalkBuilder), which natively handles:
.gitignore— standard git ignore rules, including nested.gitignorefiles and negation patterns.rsconstructignore— project-specific ignore patterns using the same glob syntax as.gitignore
Processors never walk the filesystem themselves. Instead, auto_detect and discover receive a &FileIndex and query it with their scan configuration (src_extensions, exclude directories, exclude files). This replaces the previous design where each processor performed its own recursive walk.
Build pipeline
This is the core algorithm — every rsconstruct build follows these phases
in order. Use --phases to see timing for each phase.
Phase 1: File indexing
The project tree is walked once to build the FileIndex — a sorted list of
all non-ignored files. This is the only filesystem walk; all subsequent file
lookups go through the index. See File indexing below.
Phase 2: Discovery (fixed-point loop)
Each enabled processor queries the file index and adds products to the
BuildGraph. Discovery runs in a fixed-point loop to handle
cross-processor dependencies:
file_index = walk filesystem
loop (max 10 passes):
for each processor:
processor.discover(graph, file_index)
if no new products were added → break
collect outputs from new products
inject them as virtual files into file_index
On each pass, processors may re-declare existing products (silently deduplicated) or discover new products whose inputs are virtual files from upstream generators. The loop converges when a full pass adds nothing new. Most projects converge in 1 pass; projects with generator → checker/generator chains converge in 2.
See Cross-Processor Dependencies for details on deduplication and the virtual file mechanism.
Phase 3: Dependency analysis
Dependency analyzers (e.g., the C/C++ header scanner) run against the graph
to add additional input edges. For example, if main.c includes util.h,
the analyzer adds util.h as an input to the main.c product. Results are
cached in deps.redb for incremental builds.
Phase 4: Tool version hashing
For each processor with a tool lock entry (rsconstruct tools lock), the
locked tool version hash is appended to the product’s config hash. This
ensures that upgrading a tool (e.g., ruff 0.4 → 0.5) triggers rebuilds
even if source files haven’t changed.
Phase 5: Dependency resolution
resolve_dependencies() scans the graph for products whose inputs match
other products’ outputs. When found, it creates a dependency edge — the
producer must complete before the consumer can start. This is how
cross-processor ordering works automatically (e.g., pandoc runs before the
explicit site generator because pandoc’s HTML outputs are the site
generator’s inputs).
After resolution, the graph is topologically sorted to produce the execution order.
Phase 6: Classify
Each product is classified as one of:
- Skip (up-to-date) — input checksum matches the cache entry and all outputs exist on disk. No work needed.
- Restore — input checksum matches a cache entry but outputs are missing
(e.g., after
rsconstruct clean). Outputs are restored from cache via hardlink or copy. - Build (stale) — input checksum doesn’t match any cache entry. The product must be rebuilt.
Input checksums are computed by hashing all input files (SHA-256). The mtime
pre-check (mtime_check = true, default) skips rehashing files whose mtime
hasn’t changed since the last build.
Phase 7: Execute
Products are executed in topological order, respecting dependency edges.
Independent products at the same dependency level run in parallel (controlled
by -j / RSCONSTRUCT_THREADS). Batch-capable processors group their
products into a single tool invocation.
Batch chunk sizing: In fail-fast mode (default), batch chunk size is 1 —
each product executes independently even for batch-capable processors. With
--keep-going, all products are sent in one chunk. With --batch-size N,
chunks are limited to N products. This means fail-fast mode gives the best
incremental recovery after partial failure.
For each product:
- Compute input checksum (if not already done in classify)
- Check cache — skip or restore if possible
- Execute the processor’s command
- On success: store outputs in the cache (content-addressed under
.rsconstruct/objects/) - On failure: report error (or continue if
--keep-going)
Processor source layout
All processor code lives under src/processors/. The folder structure mirrors processor type:
src/processors/
├── mod.rs # Processor trait, shared helpers (run_command, run_checker,
│ # SimpleChecker, SimpleGenerator, ProcessorBase, …)
├── checkers/ # One file per checker (ruff.rs, pylint.rs, cppcheck.rs, …)
│ └── mod.rs # Re-exports
├── generators/ # One file per generator (generator.rs, marp.rs, sass.rs, …)
│ ├── mod.rs # Shared helpers: find_templates, output_path, discover_single_format, …
│ └── tags/ # Tags generator (multi-file, has its own subfolder)
├── creators/ # One file per creator (cargo.rs, npm.rs, gem.rs, pip.rs, …)
│ ├── mod.rs # Re-exports
│ └── creator.rs # Generic creator processor
├── explicit/ # Explicit processor (user-defined command with declared outputs)
│ ├── mod.rs
│ └── explicit.rs
└── lua/ # Lua plugin host
├── mod.rs
└── lua_processor.rs
Conventions
- Every file in
src/processors/is a real processor — no utility-only files at the top level. Shared helpers live inmod.rsorgenerators/mod.rs. - Checkers use
SimpleChecker(data-driven, no boilerplate) or implementProcessordirectly for checkers with custom discovery logic (e.g.,clippy,script). - Generators use
SimpleGenerator(data-driven with a customexecute_fn) orGeneratorProcessorfor the generic pass-through generator. - Creators use
CreatorProcessorfor the generic case, or their own struct for creators with special discovery (cargo profiles, npm siblings, etc.). - Explicit is a singleton processor type with its own folder because it is neither a checker nor a generator.
- Lua is the only processor type that hosts external scripts rather than wrapping a fixed external tool. It has its own folder because it carries significant runtime state (the Lua VM).
- All processors self-register via
inventory::submit!at the bottom of their file — no central registry table to update.
Determinism
Build order is deterministic:
- File discovery is sorted
- Processor iteration order is sorted
- Topological sort produces a stable ordering
This ensures that the same project always builds in the same order, regardless of filesystem ordering.
Caching
See Cache System for full details on cache keys, storage format, rebuild classification, and per-processor caching behavior.
Subprocess execution
RSConstruct uses two internal functions to run external commands:
-
run_command()— by default captures stdout/stderr via OS pipes and only prints output on failure (quiet mode). Use--show-outputflag to show all tool output. Use for compilers, linters, and any command where errors should be shown. -
run_command_capture()— always captures stdout/stderr via pipes. Use only when you need to parse the output (dependency analysis, version checks, Python config loading). Returns the output for processing.
Parallel safety
When running with -j, each thread spawns its own subprocess. Each subprocess gets its own OS-level pipes for stdout/stderr, so there is no interleaving of output between concurrent tools. On failure, the captured output for that specific tool is printed atomically. This design requires no shared buffers or cross-thread output coordination.
Path handling
All paths are relative to project root. RSConstruct assumes it is run from the project root directory (where rsconstruct.toml lives).
Internal paths (always relative)
Product.inputsandProduct.outputs— stored as relative pathsFileIndex— returns relative paths fromscan()andquery()- Cache keys (
Product.cache_key()) — use relative paths, enabling cache sharing across different checkout locations - Cache entries (
CacheEntry.outputs[].path) — stored as relative paths
Processor execution
- Processors pass relative paths directly to external tools
- Processors set
cmd.current_dir(project_root)to ensure tools resolve paths correctly fs::read(),fs::write(), etc. work directly with relative paths since cwd is project root
Exception: Processors requiring absolute paths
If a processor absolutely must use absolute paths (e.g., for a tool that doesn’t respect current directory), it should:
- Store the
project_rootin the processor struct - Join paths with
project_rootonly at execution time - Never store absolute paths in
Product.inputsorProduct.outputs
Why relative paths?
- Cache portability — cache keys don’t include machine-specific absolute paths
- Remote cache sharing — same project checked out to different paths can share cache
- Simpler code — no need to strip prefixes for display or storage
Architecture Observations
Observations about rsconstruct’s high-level structure — the shapes that
determine how the system behaves when you try to change or extend it. Kept
separate from suggestions.md (which is tactical features and bugs) because
these are about how the code is put together, not about what it does.
Each entry has:
- A short title naming the pattern or tension.
- What the current code does.
- What that implies for changes / extensions / users.
- Load-bearing: how much of the system this shape dictates. High = touching it ripples everywhere. Low = localized quirk.
The entries are roughly ordered by how much they shape the rest of the codebase.
The central four
1. The graph is the universal coupling point
Every phase — discovery, analysis, classification, execution — reads and/or
mutates the BuildGraph. Processors receive &mut BuildGraph in their
discover() method and are trusted to add products correctly. There’s no
invariant enforcement at insertion time: empty inputs are allowed, bad dep
references are allowed, duplicate outputs are caught but duplicate inputs
aren’t. Cycles are only detected during topological sort, late.
The graph’s shape also leaks into the executor: the executor knows about
output_dirs (creators), variant (multi-format generators), config_hash
(cache keys), and product IDs. Adding a new product category (say, a
“phantom” product that exists for scheduling but produces no outputs)
requires touching both graph and executor.
Implication: the graph is the lingua franca. Any architectural change
that touches the product model — adding fields, changing what counts as a
dependency, supporting alternate execution orders — ripples into every
consumer. A healthy graph layer would have validation (reject ill-formed
products at insertion), opaque access (consumers see a trait-shaped view,
not the struct), and observer hooks (something watching mutations so
--graph-stats and graph show don’t duplicate traversal logic).
Load-bearing: very high.
2. Plugin registration at link time
Every processor and analyzer submits an inventory::submit! entry. The
registry is populated at binary link time, and enumeration is a runtime
iteration over those entries. This is elegant for modularity — adding a
processor means adding one file, no central list to update — but it has
consequences:
- No compile-time enumeration: you can’t write a match statement over all processor names, so the processor-count gets rediscovered on every run, and static checks (e.g. “every processor has a corresponding config struct”) have to be runtime assertions.
- Lua plugins are second-class: they arrive at runtime after the static
registry is frozen. The registry API has to tolerate two populations
(static + dynamic) in parallel, which is why
find_registry_entryandfind_analyzer_pluginhave to fall through both. - Ordering is alphabetical everywhere: because
inventorydoesn’t preserve submission order, every code path that touches plugins has to sort by name. This is a minor tax but it’s baked in everywhere. - Testing requires the whole binary: you can’t instantiate a stripped-down registry for tests; they pull the full set. Most tests don’t mind, but ones that want a controlled plugin set have to filter rather than inject.
Implication: the registration model favors modularity over introspectability. If rsconstruct ever wants a “declarative build” representation (think Bazel’s static action graph) the plugin layer will have to expose more schema information than it does today.
Load-bearing: high.
3. Config defaults are scattered, not composed — PARTIALLY ADDRESSED
Three sources of defaults apply in sequence:
- Per-processor defaults (e.g.
ruff→command = "ruff") in a giant match-or-registry lookup. - Scan defaults (src_dirs, src_extensions) via a separate mechanism
(
ScanDefaultsData). - User TOML overrides both.
The order matters, but it’s encoded across apply_processor_defaults,
apply_scan_defaults, and the serde deserialization.
Update: config provenance tracking (src/config/provenance.rs) now
records where each field came from (UserToml { line }, ProcessorDefault,
ScanDefault, OutputDirDefault, SerdeDefault). rsconstruct config show
annotates every field with its source. The defaults pipeline still applies
layers across multiple functions, but the provenance map makes it possible
to answer “where did this value come from?” without tracing the code.
The remaining gap: adding a new defaults layer (env-derived, user-global) still means inserting into the existing function chain rather than a declarative resolver.
Load-bearing: medium.
4. The executor owns too much policy — RESOLVED
Update: a BuildPolicy trait has been extracted to src/executor/policy.rs.
classify_products now delegates per-product decisions to &dyn BuildPolicy.
IncrementalPolicy implements the current skip/restore/rebuild logic.
Alternate policies (dry-run, always-rebuild, time-windowed) are now a single
trait implementation away — no executor changes needed.
Load-bearing: very high, but the tension is resolved.
Structural tensions
5. Processor trait assumes StandardConfig, but allows bypass
The Processor trait has a scan_config() -> &StandardConfig method that
every processor must implement. The default implementations of discover(),
auto_detect(), and supports_batch() use this config. But processors
with richer configs (e.g. ClippyConfig, CcConfig) don’t expose those
richer fields through the trait — they store them privately and access
them internally. The outside world only sees StandardConfig.
Implication: there’s no way to ask “what config does processor X
accept?” through the trait. Introspection goes through the registry
(known_fields, must_fields, field_descriptions) instead, which means
the processor has to register the metadata separately from implementing
the trait. The two representations can drift: someone adds a field to
ClippyConfig and forgets to add it to known_fields.
A healthier shape would have one source of truth per processor — the
config struct itself — with a derive macro or trait-based reflection
generating the known_fields list. Or go the other direction: make the
trait parameterized (Processor<Config>) so introspection goes through the
type system.
Load-bearing: medium. Doesn’t break anything today but is the root cause of several “remembered to update both places?” bugs we’ve fixed.
6. Analyzers are inputs-only; they can’t add products
DepAnalyzer::analyze() walks existing products and adds inputs to them.
It cannot:
- Create new products (the cpp analyzer can’t spawn a product for a header it discovered).
- Remove products.
- Change processor assignments.
This is a deliberate simplification — analyzers run in a single pass after discovery and don’t need fixed-point semantics of their own. But it means the “dependency graph” isn’t really discovered by analyzers; it’s refined by them. The actual discovery of what exists lives entirely in processors.
Implication: if a use case arises where an analyzer legitimately needs
to produce a product — e.g. “for every .proto import I find, ensure
there’s a product for generating the .pb.cc” — the analyzer interface
doesn’t support it. You’d have to turn the analyzer into a processor, or
add a “synthesize” callback. The asymmetry between processors (can add
products) and analyzers (can only add inputs) is currently invisible but
will bite eventually.
Load-bearing: medium. Not a bug, but a limitation that shapes what kinds of features are easy vs. hard.
7. Processor instance ↔ typed processor mapping is one-way — PARTIALLY ADDRESSED
A ProcessorInstance in the config holds (type_name, instance_name, config_toml). Builder::create_processors() deserializes the TOML and
produces a Box<dyn Processor>. Afterwards, the TOML blob is discarded.
Update: ProcessorInstance now carries a provenance: ProvenanceMap
that records where each field came from (user TOML with line number,
processor default, scan default, etc.). This means config show can
annotate fields with their source without reparsing TOML, and smart
commands can distinguish user-set from defaulted fields.
The remaining gap: a running Box<dyn Processor> still can’t navigate
back to its ProcessorInstance or the originating TOML section. The
provenance lives on the config side, not the runtime processor side.
Load-bearing: medium.
8. Global state in the processor runtime — RESOLVED
Update: all mutable process globals have been moved into BuildContext
(src/build_context.rs):
- The three processor globals (
INTERRUPTED,RUNTIME,INTERRUPT_SENDER) are replaced and deleted.run_commandtakes&BuildContextexplicitly. - The three checksum globals (
CACHE,MTIME_DB,MTIME_ENABLED) are moved intoBuildContext.combined_input_checksumandchecksum_fasttake&BuildContext.
Remaining process-wide state is all immutable or correctly scoped:
RuntimeFlags— immutable after startup, doesn’t vary between contexts.DECLARED_TOOLS—thread_local!, debug-only.- Compiled regexes —
LazyLock<Regex>, stateless.
Load-bearing: resolved. Multiple BuildContext instances can now run
independently (daemon mode, LSP, testing).
Broader patterns
9. Supply-driven model everywhere
The whole pipeline — discover, classify, execute — walks every product
unconditionally. There’s no demand-driven path (like make foo which
visits only the subgraph producing foo). The --target <glob> flag
filters after discovery; it doesn’t trim the work that discovery itself
does.
This is a deliberate design — rsconstruct’s typical workload is “build everything incrementally,” and supply-driven matches that well. But it means a user asking “just build X” still pays the cost of discovering all 5000 other products.
Implication: for projects at a certain scale, or for tooling that wants to quickly answer “which products would I run for this file?” (IDE integration, pre-commit hooks), the supply-driven model becomes a bottleneck. A demand-driven shortcut would require either pre-built reverse indexes (input path → product) persisted between runs, or an analytical model of each processor’s output paths (hard — processor output is computed procedurally).
Load-bearing: very high. Changing this means a fundamentally different build-system shape.
10. “Run on every build” is the default stance
Every configured processor discovers and classifies on every invocation.
There’s no concept of “processor X is slow, only run when asked.” The
-p/-x mechanism works per-invocation but not as a declarative
property. See suggestions.md for the proposed build_by_default = false
pattern — that’s a tactical fix. The architectural observation is that
rsconstruct’s model biases hard toward “all processors together,”
whereas the user mental model often has lifecycle phases (lint vs.
package vs. deploy).
Implication: adding a “goals” layer (cargo-style subcommands, or npm-style named scripts) is a natural extension direction. It would introduce a new concept — a goal is a named selection of processors — and likely requires CLI reorganization. Bigger than it sounds.
Load-bearing: medium. Shapes the CLI surface and user mental model.
11. Object store as a multi-responsibility module — RESOLVED
Update: ObjectStore has been decomposed into focused submodules:
blobs.rs— content-addressed blob storage (store, read, restore, checksum)descriptors.rs— cache descriptor CRUD (store_marker, store_blob, store_tree)restore.rs— cache query and restoration (restore_from_descriptor, needs_rebuild, can_restore, explain)management.rs— cache management (size, trim, remove_stale, list, stats)operations.rs— remote cache push/fetchconfig_diff.rs— processor config change tracking
mod.rs went from ~664 to ~223 lines (struct definition, types, constructor).
Each concern is now a focused 100–150 line file.
Load-bearing: very high, but the monolith is resolved.
What’s absent that one might expect
12. No abstraction for “tool invocation”
Every processor that shells out to a subprocess rolls its own Command
building: env vars, arg construction, timeout, output capture, error
classification. Shared helpers (run_command, check_command_output)
exist but are minimal. Processor implementations still have to know about:
- How to pass files (positional args vs.
--file=Xvs. stdin vs. response file when argv is too long). - How to interpret exit codes (some tools return 1 for “found issues”, some return 0 and print to stderr, some return 2 for config errors).
- How to parse output for structured errors.
Implication: processor implementations have roughly 30-80 lines of
boilerplate each, and they’re inconsistent. A ToolInvocation abstraction
with pluggable arg-passing strategies would shrink most processors to a
few lines of declaration. This also makes adding a new processor harder
than it needs to be.
Load-bearing: medium.
13. No pluggable reporting / event stream
Today reporting is hardcoded: println! during execution, colored summary
at the end, --json mode emits structured events, --trace emits Chrome
tracing format. Each reporting path is a separate code path threading
through the executor.
Implication: adding a new output format (JUnit XML for CI, GitHub Actions annotations, custom Slack webhook) means threading another code path through the executor. A proper event-bus model — executor emits events, subscribers render them — would make this a two-file change (subscribe + format).
Load-bearing: medium.
14. No formal dry-run execution
There’s --stop-after classify, which stops after classification, and
there’s dry_run() (different from --dry-run which is a flag on build),
and there’s --explain which annotates per-product decisions. Three
partially-overlapping mechanisms. The user-facing story is “to see what
would happen, use X or Y or Z depending on what you want.”
Implication: these evolved separately. A unified “simulation mode” that fully runs the classify pipeline and outputs what would happen — including what cache entries would be produced — would subsume the three. Likely a small refactor, but requires aligning on the output shape.
Load-bearing: low-medium.
Summary of architectural recommendations
All four highest-leverage refactors are now complete:
Extract a— done.BuildPolicytrait from the executorclassify_productsdelegates per-product skip/restore/rebuild decisions to a&dyn BuildPolicy.IncrementalPolicyimplements the current logic. Future policies (dry-run, always-rebuild, time-windowed) are a single trait impl. Seesrc/executor/policy.rs.Decompose— done.ObjectStoremod.rssplit from 664 → 223 lines into focused submodules:blobs.rs(content-addressed storage),descriptors.rs(cache descriptor CRUD),restore.rs(restore/needs_rebuild/can_restore/explain). Existingmanagement.rs,operations.rs,config_diff.rsunchanged.Consolidate config resolution with provenance tracking— done. Config fields now carryFieldProvenance(user TOML with line number, processor default, scan default, serde default).config showannotates every field with its source. Seesrc/config/provenance.rs.Introduce a— done. The three process globals (BuildContextstruct replacing process globalsINTERRUPTED,RUNTIME,INTERRUPT_SENDER) are replaced by aBuildContextstruct threaded through theProcessortrait, executor, analyzers, and remote cache. Seesrc/build_context.rs.
Entries 3, 7, and 8 are partially addressed — the core issues are resolved but minor gaps remain (see individual entries above).
Entries 1, 2, 5, 6, 9, 10, 12, 13, 14 are observations about the code’s shape — not necessarily problems to fix, but constraints a new contributor should understand before making structural changes.
The technical observations (code duplication in discovery helpers, dead
fields in ProcessorPlugin, scattered error handling) are recorded in
suggestions.md as tactical items.
Design Notes
This page has been merged into Architecture. See that page for RSConstruct’s internal design, subprocess execution, path handling, and caching behavior.
Coding Standards
Rules that apply to the RSConstruct codebase and its documentation.
Always add context to errors
Every ? on an IO operation must have .with_context() from anyhow::Context. A bare ? on fs::read, fs::write, fs::create_dir_all, Command::spawn, or any other syscall-wrapping function is a bug. It produces error messages like “No such file or directory” with no indication of which file or which operation failed.
Good:
#![allow(unused)]
fn main() {
fs::read(&path)
.with_context(|| format!("Failed to read config file: {}", path.display()))?;
}
Bad:
#![allow(unused)]
fn main() {
fs::read(&path)?; // useless error message
}
The error chain should read like a stack trace of intent: “Failed to build project > Failed to execute ruff on src/main.py > Failed to spawn command: ruff > No such file or directory”.
Fail hard, never degrade gracefully
When something fails, it must fail the entire build. Do not try-and-fallback, do not silently substitute defaults for missing resources, do not swallow errors. If a processor is configured to use a file and that file does not exist, that is an error. The user must fix their configuration or their project, not the code.
Optional features must be opt-in via explicit configuration (default off). When the user enables a feature, all resources it requires must exist.
Processor naming conventions
Every processor has a single identity string (e.g. ruff, clang_tidy,
mdbook). All artifacts derived from a processor must use that same string
consistently:
| Artifact | Convention | Example (clang_tidy) |
|---|---|---|
| Name constant | pub const UPPER: &str = "name"; in processors::names | CLANG_TIDY: &str = "clang_tidy" |
| Source file | src/processors/checkers/{name}.rs or generators/{name}.rs | checkers/clang_tidy.rs |
| Processor struct | {PascalCase}Processor | ClangTidyProcessor |
| Config struct | {PascalCase}Config | ClangTidyConfig |
Field on ProcessorConfig | pub {name}: {PascalCase}Config | pub clang_tidy: ClangTidyConfig |
Match arm in processor_enabled_field() | "{name}" => self.{name}.enabled | "clang_tidy" => self.clang_tidy.enabled |
Entry in default_processors() | names::UPPER.into() | names::CLANG_TIDY.into() |
Entry in validate_processor_fields() | processor_names::UPPER => {PascalCase}Config::known_fields() | processor_names::CLANG_TIDY => ClangTidyConfig::known_fields() |
Entry in expected_field_type() | ("{name}", "field") => Some(FieldType::...) | ("clang_tidy", "compiler_args") => ... |
Entry in src_dirs() | &self.{name}.scan | &self.clang_tidy.scan |
Entry in resolve_scan_defaults() | self.{name}.scan.resolve(...) | self.clang_tidy.scan.resolve(...) |
Registration in create_builtin_processors() | Builder::register(..., proc_names::UPPER, {PascalCase}Processor::new(cfg.{name}.clone())) | Builder::register(..., proc_names::CLANG_TIDY, ClangTidyProcessor::new(cfg.clang_tidy.clone())) |
Re-export in processors/mod.rs | pub use checkers::{PascalCase}Processor | pub use checkers::ClangTidyProcessor |
Install command in tool_install_command() | "{tool}" => Some("...") | "clang-tidy" => Some("apt install clang-tidy") |
When adding a new processor, use the identity string everywhere. Do not
abbreviate, rename, or add suffixes (Gen, Bin, etc.) to any of the
derived names.
Never use a _check suffix in processor names. Name the processor after the
tool or library it wraps — do not abstract or rename it (e.g. zspell not
spellcheck, ruff not python_lint).
Processor new() must be infallible
Every processor’s fn new(config: XxxConfig) -> Self must return Self, not
Result<Self>. This is enforced at compile time by the registry macro. If
construction can fail, defer the failure to execute() or discover().
Processor directory layout
Each processor category directory (src/processors/checkers/,
src/processors/generators/, src/processors/creators/) must contain
only processor implementation files — one processor per .rs file (plus
mod.rs). Shared utilities, helpers, or supporting code used by multiple
processors must live in src/processors/ directly, not inside a category
subdirectory. This keeps each category directory a flat, scannable list of
processors.
Test naming for processors
Test functions for a processor must be prefixed with the processor name.
For example, tests for the cc_single_file processor must be named
cc_single_file_compile, cc_single_file_incremental_skip, etc.
No indented output
All println! output must start at column 0. Never prefix output with spaces
or tabs for visual indentation unless when printing some data with structure.
Suppress tool output on success
External tool output (compilers, linters, etc.) must be captured and only
shown when a command fails. On success, only rsconstruct’s own status messages appear.
Users who want to always see tool output can use --show-output. This keeps
build output clean while still showing errors when something goes wrong.
Never hard-code counts of dynamic sets
Documentation and code must never state the number of processors, commands, or any other set that changes as the project evolves. Use phrasing like “all processors” instead of “all seven processors”. Enumerating the members of a set is acceptable; stating the cardinality is not.
Use well-established crates
Prefer well-established crates over hand-rolled implementations for common functionality (date/time, parsing, hashing, etc.). The Rust ecosystem has mature, well-tested libraries for most tasks. Writing custom implementations introduces unnecessary bugs and maintenance burden. If a crate exists for it, use it.
No trailing newlines in output
Output strings passed to println!, pb.println(), or similar macros must not
contain trailing newlines. These macros already append a newline. Adding \n
inside the string produces unwanted blank lines in the output.
Include processor name in error messages
Error messages from processor execution must identify the processor so the
user can immediately tell which processor failed. The executor’s
record_failure() method automatically wraps every error with
[processor_name] before printing or storing it, so processors do not need
to manually prefix their bail! messages. Just write the error naturally
(e.g. bail!("Misspelled words in {}", path)) and the executor will produce
[aspell] Misspelled words in README.md.
Never silently ignore user configuration
Every field a user can write in rsconstruct.toml (or in any YAML/TOML
manifest we load: cc.yaml, linux-module.yaml, etc.) must produce an
observable effect in the engine. The two failure modes to prevent are:
- Schema-level silent-ignore — serde accepts an unknown field because
the struct doesn’t reject it. A user typos
enabeld = false, we accept it, nothing happens, they wonder why their setting had no effect. - Runtime silent-ignore — serde stores the field in a struct, but no
code in the engine ever reads it. This is exactly how the
[analyzer.X] enabled = falsebug shipped: the CLI subcommand wrote the field, the config loader happily deserialized it, and the analyzer runner ignored it. A half-wired feature is worse than no feature.
Rule 1: reject unknown fields at the schema level
Every struct that deserializes user input must use one of:
#[serde(deny_unknown_fields)]— preferred for plain structs (no#[serde(flatten)]). Serde enforces the reject at deserialize time.KnownFieldstrait +validate_processor_fields()— for top-level processor configs that use#[serde(flatten)]to embedStandardConfig. Serde’sdeny_unknown_fieldsdoesn’t see throughflatten(known limitation), so we implement the check ourselves inConfig::load().
Nested structs inside a flattened parent (e.g. CcLibraryDef inside
CcManifest) must use deny_unknown_fields — they don’t flatten, so the
direct mechanism works.
The only legitimate exception: structs that intentionally capture unknown
fields (ProcessorConfig.extra for Lua plugins). These are rare and must
be documented at the field.
Rule 2: every accepted field must be read
When you add a field to any config struct, add the engine code that consumes it in the same change. Don’t ship the schema first and the behaviour “soon.” If the field is a toggle, the runner must check it. If it’s a path, something must open or scan that path. If it’s a value, a code path must branch on it.
When you add a CLI subcommand that writes a field (like analyzers disable
writing enabled = false), verify the runtime reads it by writing an
integration test that exercises the toggle end-to-end — config → build →
observable effect. A passing write-the-config test is not enough; the effect
must be asserted.
When you remove or rename a field, grep the codebase and docs to catch
stragglers. A field that exists in defconfig_toml but no longer affects
behaviour is a regression of Rule 2, even if no user reports it.
When reviewing
Reject a patch that adds a new Deserialize struct without either
deny_unknown_fields or a KnownFields impl. Reject a patch that adds a
config field without the runtime code that reads it. Both failure modes
cost users time in exactly the same way — they write something sensible,
get no feedback, and conclude the tool is broken.
Rule 3: validate before constructing
Schema validation must run inside Config::load(), before any processor or
analyzer is instantiated. Builder::new() should never be the first place
that surfaces an unknown-field or unknown-type error, because by the time
Builder::new() runs it has already opened redb databases, walked the
filesystem to build the FileIndex, and created CPU-bound infrastructure
the user doesn’t need just to see “you typoed a field name.”
The validators are validate_processor_fields_raw and
validate_analyzer_fields_raw in src/config/mod.rs. They return
Vec<String> so Config::load() can surface errors from both validators
together under a single Invalid config: header. If you add a new config
surface (a new top-level section with its own registered plugins), add a
matching validator and call it from Config::load() alongside the
existing two.
Unit-test the validators directly (see src/config/tests.rs) — not only
through rsconstruct toml check. Direct tests pin down the contract that
validation is a pure function of the parsed TOML, independent of
filesystem or plugin instantiation.
No “latest” git tag
Never create a git tag named latest. Use only semver tags (e.g. v0.3.0).
A latest tag causes confusion with container registries and package managers
that use the word “latest” as a moving pointer, and it conflicts with GitHub’s
release conventions.
Book layout mirrors the filesystem
The book (docs/src/) is divided into two sections by SUMMARY.md:
- A top-level user-facing section (introduction, commands, configuration, processors, etc.) — for people who use rsconstruct to build their projects.
- A “For Maintainers” section — for contributors modifying rsconstruct itself: architecture, design decisions, coding standards, cache internals, and so on.
The filesystem must mirror this split. A reader glancing at a path should be able to tell which audience the document is for:
- User-facing chapters live at the top level of
docs/src/— e.g.docs/src/configuration.md,docs/src/commands.md. - Maintainer chapters live under
docs/src/internal/— e.g.docs/src/internal/architecture.md,docs/src/internal/cache.md. - Per-processor reference docs live under
docs/src/processors/— these are user-facing (they document how to configure each processor).
When adding a new doc, decide first whether it’s user-facing or internal, then place it accordingly. Moving a doc across the boundary requires moving the file too — don’t leave an internal document at the top level just because its links would break.
When cross-referencing:
- Inside
internal/→ link to sibling files directly ([X](other.md)). - From a top-level doc to an internal doc →
[X](internal/other.md). - From
processors/to an internal doc →[X](../internal/other.md). - From
internal/to a user-facing doc →[X](../other.md).
This rule is enforced by convention, not by tooling. Reviewers should reject PRs that add a maintainer-only document at the top level (or vice versa).
Strictness
This project holds itself to a strict compiler baseline and treats every relaxation as a deliberate, documented choice. This chapter explains the baseline, the rules for opting out, and the history of the most recent strictness pass.
Crate-level baseline
src/main.rs starts with:
#![allow(unused)]
#![deny(clippy::all)]
#![deny(warnings)]
fn main() {
}
Effect:
- Every warning is a compile error. Unused imports, dead code, unused variables, deprecated APIs — all stop the build. There is no “warning fatigue” because there are no warnings.
- All of Clippy’s default lint group (
clippy::all) is enforced atdenylevel. This covers ~500 lints spanning correctness, complexity, style, and perf.
This is one step short of forbid. forbid cannot be overridden per-item; deny allows a per-item #[allow(...)] escape hatch. We chose deny so that principled exceptions remain possible, but each one is an obvious, grep-able act.
The rule for #[allow(...)]
Every #[allow(...)] in the codebase MUST:
- Be necessary. If the compiler accepts the code without the allow, remove the allow. The compiler is cleverer than you think — dead code that looks dead to a human is sometimes reachable, and vice versa.
- Be scoped minimally. Attach the allow to the smallest item (a single field, a single function, a single import) that requires it — never to a whole struct or module when one member is the culprit.
- Carry a comment explaining why. The comment answers: “what feature/workflow keeps this thing around despite looking dead?” A silent
#[allow(dead_code)]is a bug. - Be periodically re-audited. Scaffolding becomes production code (allow removed) or is abandoned (code deleted). Long-lived allows are a code smell.
Current #[allow] attributes (at time of writing)
After the strictness pass, 5 allows remain. Each is documented in the source and reproduced here with rationale.
src/object_store/mod.rs — remote_pull field
#![allow(unused)]
fn main() {
/// Whether to pull from remote cache.
/// Wired into the constructor but not yet consulted by any read path —
/// remote-pull integration is scaffolded in `operations.rs` (the
/// `try_fetch_*` helpers) but not yet called from the executor.
#[allow(dead_code)]
remote_pull: bool,
}
Why kept: remote-pull is a real, partially-implemented feature. The try_fetch_* helpers exist; they’re just not wired into classify_products / the restore path yet. Removing the field now would mean re-adding it when we wire up the feature. Keeping it with a comment documents what’s missing.
When to remove: when remote-pull read paths are wired up, or when we formally abandon remote-pull.
src/object_store/operations.rs — three try_fetch_* / try_push_descriptor_* helpers
#![allow(unused)]
fn main() {
// Scaffolding for remote-pull: wired into the API surface but not yet
// called from any read path. Intentional; tracked under remote-pull WIP.
#[allow(dead_code)]
pub(super) fn try_fetch_object_from_remote(&self, checksum: &str) -> Result<bool> { ... }
// Scaffolding for remote-pull (for paired fetch-after-push semantics).
// Not yet called from any write path; tracked under remote-pull WIP.
#[allow(dead_code)]
pub(super) fn try_push_descriptor_to_remote(&self, descriptor_key: &str, data: &[u8]) -> Result<()> { ... }
/// Try to fetch a descriptor from remote cache.
/// Scaffolding for remote-pull; not yet called from any read path.
#[allow(dead_code)]
pub(super) fn try_fetch_descriptor_from_remote(&self, descriptor_key: &str) -> Result<Option<Vec<u8>>> { ... }
}
Why kept: same feature as above. These are the building blocks the eventual remote-pull implementation will call. They’re tested (implicitly via the types that check they compile), and they work when called — they just aren’t called yet.
When to remove: same trigger as the remote_pull field.
src/registries/processor.rs — ProcessorPlugin.processor_type field
#![allow(unused)]
fn main() {
pub struct ProcessorPlugin {
pub name: &'static str,
/// Processor type. Declared by every plugin but not yet queried by any
/// runtime code path — kept as plugin metadata so future features
/// (e.g. `processors list --type=checker`) can filter without touching
/// every registration.
#[allow(dead_code)]
pub processor_type: ProcessorType,
...
}
}
Why kept: Every inventory::submit! for a processor declares a type (Checker, Generator, Creator, Explicit). The runtime currently reads processor_type() from the Processor trait, never from the plugin. But the static plugin metadata is the right place for filtering features like rsconstruct processors list --type=checker. Removing the field now would mean adding 93 processor_type: ... lines back later when we want the filter.
When to remove: never, once the first feature queries it. Until then, the allow is the cheap price of preserving optionality.
What the pass removed
Seven allows were removed during the most recent strictness sweep. Three of them masked genuine dead code, which was then deleted:
checksum::invalidate()— never called; deleted.checksum::clear_cache()— never called; deleted.ProcessorBase.namefield +ProcessorBase::auto_detect()helper — never read, never called; deleted.
Four were stale — the code they guarded was actually used, and the allow no longer made the compiler quieter:
remote_cache::RemoteCache::download— used byoperations.rs; allow removed.exit_code::IoError— used in match arms and by theerrorsCLI command; allow removed.ProcessorPluginstruct-level#[allow(dead_code)]— only theprocessor_typefield needed it; scoped down.builder/mod.rs—#[allow(unused_imports)]onuse crate::config::*;— the compiler wasn’t flagging the glob at all; allow removed.
What this pass did NOT change
The sweep was focused on #[allow] attributes. Broader strictness knobs were left as-is, by choice:
.unwrap()and.expect()counts. Many are on internal invariants where panicking is correct (contract violation, not user error). An audit could tighten some to?, but this is a separate pass with its own judgment calls.missing_docs,missing_debug_implementations, etc. Enabling these would require documenting every public item — a much larger change.clippy::pedantic,clippy::nursery,clippy::cargo. These add ~200 more lints beyondclippy::all. Many are noisy or stylistic. Enabling them is worth considering but outside the scope of “remove unnecessary allows.”- The
use crate::config::*;glob import inbuilder/mod.rs. Narrowing it would require enumerating ~15 symbols and risks churn. Left as-is.
Adding a new #[allow]
When you find yourself wanting to add an #[allow(...)], follow this checklist:
- Can the compiler complaint be fixed instead? Remove the unused import, inline the unused function, prove the variable is live. Most of the time the answer is yes.
- Is this the minimum scope? Put the allow on the single field, not the whole struct. On the single function, not the whole impl. On the single import, not the whole
useblock. - Did you write a comment? One sentence answering “what feature / workflow justifies this?” is enough. “Reserved for future use” is NOT enough — say what future use, and what would trigger the deletion.
- Did you open a tracking concern? If the allow is for WIP scaffolding, the WIP should be tracked somewhere (a TODO comment with a
// wip:tag, an issue, a feature flag) so future maintainers know it’s temporary.
A reviewer who sees a new #[allow] should read the comment, check the rationale, and ask “could we just fix this instead?” before approving.
Running the audit
A quick sweep to find all current allows:
grep -rn '#\[allow(' src/
For each hit, read the surrounding context and the comment. If the comment is missing or weak, or the code it guards has become truly used, the allow should come out.
See also
- Coding Standards — the style rules beyond strictness.
- Processor Contract — the invariants each processor must uphold.
src/main.rs— the crate-level#![deny(...)]directives.
Testing
RSConstruct uses two kinds of tests:
- Integration tests in
tests/— the primary test suite. These exercise the compiledrsconstructbinary as a black box, building fake projects in temp directories and asserting on CLI output and side effects. - Unit tests in
src/(#[cfg(test)] mod tests) — used sparingly, only for self-contained modules whose internals cannot be exercised adequately through the CLI. Currently this issrc/graph.rs(dedup and topological-sort logic).
Running tests
cargo test # Run all tests
cargo test rsconstructignore # Run tests matching a name
cargo test -- --nocapture # Show stdout/stderr from tests
Why unit tests live in src/ (not tests/)
There is a recurring question: should unit tests move to tests/ to keep source files shorter and more readable? The short answer is no, for a structural reason specific to this crate.
This crate is a binary only — there is no src/lib.rs. Integration tests under tests/ can only link against a library crate; against a binary crate they can only do what tests/main.rs does today: spawn the rsconstruct binary as a subprocess and assert on its output. So there are only three real options for testing internal logic like BuildGraph:
| Option | Cost |
|---|---|
Unit tests inline in src/ (current) | Longer source files (mitigated by #[cfg(test)] stripping them from release builds, and by editor folding) |
Move tests to tests/ as end-to-end tests | Far more code per test, much slower, indirect — can’t isolate a specific dedup branch without building a whole fake project |
Add a src/lib.rs exposing modules | Architectural change — the crate becomes both a library and a binary. Forces decisions about what is public API |
The third option is the “clean” fix but it has ongoing costs (API surface to maintain, semver implications if we ever publish the library). The first option has only a readability cost, and it’s the idiomatic Rust approach for binary crates.
Rule: default to writing integration tests in tests/. Only add a #[cfg(test)] mod tests block in src/ when the thing under test is genuinely hard to exercise through the CLI (e.g. a specific branch of a dedup helper that requires setting up graph state that would take dozens of real products to reproduce end-to-end). When a source file grows large enough that its inline test module dominates the file, split the tests into a sibling file via #[cfg(test)] mod tests; + src/MODULE/tests.rs, rather than moving them out of src/ entirely.
Test directory layout
tests/
├── common/
│ └── mod.rs # Shared helpers (not a test binary)
├── build.rs # Build command tests
├── cache.rs # Cache operation tests
├── complete.rs # Shell completion tests
├── config.rs # Config show/show-default tests
├── dry_run.rs # Dry-run flag tests
├── graph.rs # Dependency graph tests
├── init.rs # Project initialization tests
├── processor_cmd.rs # Processor list/auto/files tests
├── rsconstructignore.rs # .rsconstructignore / .gitignore exclusion tests
├── status.rs # Status command tests
├── tools.rs # Tools list/check tests
├── watch.rs # File watcher tests
├── processors.rs # Module root for processor tests
└── processors/
├── cc_single_file.rs # C/C++ compilation tests
├── zspell.rs # Zspell processor tests
└── template.rs # Template processor tests
Each top-level .rs file in tests/ is compiled as a separate test binary by Cargo. The processors.rs file acts as a module root that declares the processors/ subdirectory modules:
#![allow(unused)]
fn main() {
mod common;
mod processors {
pub mod cc_single_file;
pub mod zspell;
pub mod template;
}
}
This is the standard Rust pattern for grouping related integration tests into subdirectories without creating a separate binary per file.
Shared helpers
tests/common/mod.rs provides utilities used across all test files:
| Helper | Purpose |
|---|---|
setup_test_project() | Create an isolated project in a temp directory with rsconstruct.toml and basic directories |
setup_cc_project(path) | Create a C project structure with the cc_single_file processor enabled |
run_rsconstruct(dir, args) | Execute the rsconstruct binary in the given directory and return its output |
run_rsconstruct_with_env(dir, args, env) | Same as run_rsconstruct but with extra environment variables (e.g., NO_COLOR=1) |
All helpers use env!("CARGO_BIN_EXE_rsconstruct") to locate the compiled binary, ensuring tests run against the freshly built version.
Every test creates a fresh TempDir for isolation. The directory is automatically cleaned up when the test ends.
Test categories
Command tests
Tests in build.rs, clean, dry_run.rs, init.rs, status.rs, and watch.rs exercise CLI commands end-to-end:
#![allow(unused)]
fn main() {
#[test]
fn force_rebuild() {
let temp_dir = setup_test_project();
// ... set up files ...
let output = run_rsconstruct_with_env(temp_dir.path(), &["build", "--force"], &[("NO_COLOR", "1")]);
assert!(output.status.success());
let stdout = String::from_utf8_lossy(&output.stdout);
assert!(stdout.contains("[template] Processing:"));
}
}
These tests verify exit codes, stdout messages, and side effects (files created or removed).
Processor tests
Tests under processors/ verify individual processor behavior: file discovery, compilation, linting, incremental skip logic, and error handling. Each processor test module follows the same pattern:
- Set up a temp project with appropriate source files
- Run
rsconstruct build - Assert outputs exist and contain expected content
- Optionally modify a file and rebuild to test incrementality
Ignore tests
rsconstructignore.rs tests .rsconstructignore pattern matching: exact file patterns, glob patterns, leading / (anchored), trailing / (directory), comments, blank lines, and interaction with multiple processors.
Common assertion patterns
Exit code:
#![allow(unused)]
fn main() {
assert!(output.status.success());
}
Stdout content:
#![allow(unused)]
fn main() {
let stdout = String::from_utf8_lossy(&output.stdout);
assert!(stdout.contains("Processing:"));
assert!(!stdout.contains("error"));
}
File existence:
#![allow(unused)]
fn main() {
assert!(path.join("out/cc_single_file/main.elf").exists());
}
Incremental builds:
#![allow(unused)]
fn main() {
// First build
run_rsconstruct(path, &["build"]);
// Second build should skip
let output = run_rsconstruct_with_env(path, &["build"], &[("NO_COLOR", "1")]);
let stdout = String::from_utf8_lossy(&output.stdout);
assert!(stdout.contains("Skipping (unchanged):"));
}
Mtime-dependent rebuilds:
#![allow(unused)]
fn main() {
// Modify a file and wait for mtime to differ
std::thread::sleep(std::time::Duration::from_millis(100));
fs::write(path.join("src/header.h"), "// changed\n").unwrap();
let output = run_rsconstruct(path, &["build"]);
let stdout = String::from_utf8_lossy(&output.stdout);
assert!(stdout.contains("Processing:"));
}
Writing a new test
- Add a test function in the appropriate file (or create a new
.rsfile undertests/for a new feature area) - Use
setup_test_project()orsetup_cc_project()to create an isolated environment - Write source files and configuration into the temp directory
- Run
rsconstructwithrun_rsconstruct()orrun_rsconstruct_with_env() - Assert on exit code, stdout/stderr content, and output file existence
If adding a new processor test module, declare it in tests/processors.rs:
#![allow(unused)]
fn main() {
mod processors {
pub mod cc_single_file;
pub mod zspell;
pub mod template;
pub mod my_new_processor; // add here
}
}
Test coverage by area
| Area | File | Tests |
|---|---|---|
| Build command | build.rs | Force rebuild, incremental skip, clean, deterministic order, keep-going, timings, parallel -j flag, parallel keep-going, parallel all-products, parallel timings, parallel caching |
| Cache | cache.rs | Clear, size, trim, list operations |
| Complete | complete.rs | Bash/zsh/fish generation, config-driven completion |
| Config | config.rs | Show merged config, show defaults, annotation comments |
| Dry run | dry_run.rs | Preview output, force flag, short flag |
| Graph | graph.rs | DOT, mermaid, JSON, text formats, empty project |
| Init | init.rs | Project creation, duplicate detection, existing directory preservation |
| Processor command | processor_cmd.rs | List, all, auto-detect, files, unknown processor error |
| Status | status.rs | UP-TO-DATE / STALE / RESTORABLE reporting |
| Tools | tools.rs | List tools, list all, check availability |
| Watch | watch.rs | Initial build, rebuild on change |
| Ignore | rsconstructignore.rs | Exact match, globs, leading slash, trailing slash, comments, cross-processor |
| Template | processors/template.rs | Rendering, incremental, dep_inputs |
| CC | processors/cc_single_file.rs | Compilation, headers, per-file flags, mixed C/C++, config change detection |
| Zspell | processors/zspell.rs | Correct/misspelled words, code block filtering, custom words, incremental |
Parameter Naming Conventions
This document establishes the canonical names for configuration parameters across all processors, and the reasoning behind each name. Use this as the reference when adding new processors or renaming existing ones.
Taxonomy
Parameters fall into four categories:
| Category | Purpose |
|---|---|
| Source discovery | Which files are the primary targets to process |
| Dependency tracking | Which additional files affect the checksum / trigger rebuilds |
| Tool configuration | What command/tool to run and how |
| Execution control | Batching, parallelism, output location |
Source Discovery Parameters
These parameters determine which files are the primary inputs — the files that get processed, linted, or transformed.
| Parameter | Type | Description |
|---|---|---|
src_dirs | string[] | Directories to scan recursively for source files. |
src_extensions | string[] | File extensions to match during scanning (e.g. [".py", ".pyi"]). |
src_exclude_dirs | string[] | Directory path segments to skip during scanning. |
src_exclude_files | string[] | File names to skip during scanning. |
src_exclude_paths | string[] | Exact relative paths to skip during scanning. |
src_files | string[] | Explicit list of source files to process. When set, bypasses src_dirs, src_extensions, and all exclude filters entirely. |
src_files vs scanning
src_dirs + src_extensions is the default discovery mechanism — the processor
walks directories and finds matching files automatically.
src_files is for when you know exactly which files you want processed and
don’t want any scanning. Setting src_files disables all scan-based
discovery for that processor instance.
Dependency Tracking Parameters
These parameters declare files that the processor depends on but does not process directly. A change to any of these files invalidates the cache and triggers a rebuild, but the files are not passed as arguments to the tool.
| Parameter | Type | Description |
|---|---|---|
dep_inputs | string[] | Explicit dependency files (e.g. config files, schema files). Globs are supported. Fails if a listed file does not exist. |
dep_auto | string[] | Like dep_inputs but silently ignored when the file does not exist. Used for optional config files (e.g. .pylintrc, pyproject.toml). |
Why two parameters?
dep_inputs is strict — it errors if a file is missing, which catches
mistakes in configuration. dep_auto is lenient — it is for well-known
config files that may or may not be present in a given project.
Tool Configuration Parameters
command and args always appear together. Every processor that has command
must also have args. They are treated as a unit: both participate in the
config checksum (computed from each processor’s checksum_fields()), so
changing either the command or any argument invalidates the cache and triggers
a rebuild.
| Parameter | Type | Description |
|---|---|---|
command | string | The executable to run. Required when the processor is active. If the value is a path to a local file, its content checksum is also tracked as a dependency. |
args | string[] | Arguments passed to the command before file paths. Always present alongside command. Both command and args values are included in the config checksum. |
command dependency tracking
For the script and generator processors, if command points to a file that
exists on disk (e.g. command = "scripts/my_linter.sh"), rsconstruct
automatically adds it as an input dependency. This means that if the script
itself changes, all affected products are rebuilt. System tools (e.g. bash,
python3) are not files in the project and are not tracked.
Execution Control Parameters
| Parameter | Type | Description |
|---|---|---|
batch | bool | When true, pass all files to the command in a single invocation. When false, invoke once per file. Default: true for most processors. |
max_jobs | int | Maximum parallel jobs for this processor. Overrides the global --jobs flag. |
output_dir | string | Directory where output files are written (generator processors). |
output_extension | string | File extension for generated output files. |
Processor Contract
Rules that all processors must follow.
Fail hard, never degrade gracefully
When something fails, it must fail the entire build. Do not try-and-fallback, do not silently substitute defaults for missing resources, do not swallow errors. If a processor is configured to use a file and that file does not exist, that is an error. The user must fix their configuration or their project, not the code.
Optional features must be opt-in via explicit configuration (default off). When the user enables a feature, all resources it requires must exist.
No work without source files
An enabled processor must not fail the build if no source files match its file patterns. Zero matching files means zero products discovered; the processor simply does nothing. This is not an error — it is the normal state for a freshly initialized project.
Single responsibility
Each processor handles one type of transformation or check. A processor discovers its own products and knows how to execute, clean, and report on them.
Deterministic discovery
discover() receives an instance_name parameter identifying the processor
instance (e.g., "ruff" or "script.lint_a" for multi-instance processors).
Use this name when calling graph.add_product() — do not use hardcoded
processor type constants.
discover() must return the same products given the same filesystem state.
File discovery, processor iteration, and topological sort must all produce
sorted, deterministic output so builds are reproducible.
Incremental correctness
Products must declare all their inputs. If any declared input changes, the product is rebuilt. If no inputs change, the cached result is reused. Processors must not rely on undeclared side inputs for correctness (support files read at execution time but excluded from the input list are acceptable only when changes to those files can never cause a previously-passing product to fail).
Execution isolation
A processor’s execute() must only write to the declared output paths
(or, for creators, to the expected output directory).
It must not modify source files, other products’ outputs, or global state.
Output directory caching (creators)
Creators that set output_dir on their products get automatic
directory-level caching. After successful execution, the executor walks
the output directory, stores every file as a content-addressed object,
and records a manifest with paths, checksums, and Unix permissions.
On restore, the entire directory is recreated from cache.
The cache_output_dir config option (default true) controls this.
When disabled, creators fall back to stamp-file or empty-output
caching (no directory restore on rsconstruct clean && rsconstruct build).
Creators that use output_dir caching must implement clean() to
remove the output directory so it can be restored from cache.
Error reporting
On failure, execute() returns an Err with a clear message including
the relevant file path and the nature of the problem. The executor
decides whether to abort or continue based on --keep-going.
Batch execution and partial failure
Batch-capable processors implement supports_batch() and execute_batch().
The execute_batch() method receives multiple products and must return one
Result per product, in the same order as the input.
External tool processors that invoke a single subprocess for the entire
batch typically use execute_generator_batch(), which maps a single exit code
to all-success or all-failure. If the tool fails, all products in the batch
are marked failed — there is no way to determine which outputs are valid.
Internal processors (e.g., imarkdown2html, isass, ipdfunite) that process
files in-process should return per-file results so that partial failure is
handled correctly — only the actually-failed products are rebuilt on the next run.
Chunk sizing: In fail-fast mode (default), the executor uses chunk_size=1
even for batch-capable processors, so each product is cached individually. This
gives the best incremental recovery. Larger chunks are used only with
--keep-going or explicit --batch-size.
Cache System
RSConstruct uses a content-addressed cache to enable fast incremental builds. This page describes the cache architecture, storage format, and rebuild logic.
Overview
The cache lives in .rsconstruct/ and consists of:
objects/— content-addressed object store (all cache data)deps.redb— source file dependency cache (see Dependency Caching)
There is no separate database. The object store is the cache.
Data model
The object store contains three kinds of objects, inspired by git:
Blobs
A blob is a file’s raw content, addressed by its SHA-256 content hash. Blobs are optionally zstd-compressed and made read-only to prevent corruption when restored via hardlinks.
Blobs are stored content-addressed — two products producing identical output share the same blob. This enables deduplication and hardlink-based restoration.
Why blobs don’t store output paths
A blob is pure content — it has no knowledge of where it will be restored. This is critical for two reasons:
-
Rename survival. If you rename
foo.mdtobar.mdwithout changing its content, the cache key (which is content-addressed) is the same. The blob is reused and restored to the new output path (bar.txtinstead offoo.txt). If the blob stored its output path, this wouldn’t work. -
Deduplication across trees. Multiple tree entries can point to the same blob under different paths. For example, if two files in a creator’s output have identical content, they share the same blob object in the store. The tree records the path; the blob just holds the content.
Trees
A tree is a serialized list of (path, mode, blob_checksum) entries describing a set of output files. Trees are stored in the object store, addressed by the cache key (not by content hash). A tree maps relative file paths to content-addressed blobs. Multiple trees can point to the same blobs — deduplication happens at the blob level.
Markers
A marker is a zero-byte object indicating that a check passed. Markers are stored in the object store, addressed by the cache key.
Cache entries
A cache entry is a small descriptor stored in the object store at the path derived from the cache key. It contains:
{"type": "blob", "checksum": "abc123...", "mode": 493}
Note: the blob descriptor has no path — the product knows where its output goes.
Or:
{"type": "tree", "entries": [{"path": "dir/file.txt", "checksum": "def456...", "mode": 493}]}
or:
{"type": "marker"}
The actual file content lives in separate content-addressed blob objects. The cache entry is just a pointer (for generators) or a manifest (for creators).
Object store layout
.rsconstruct/objects/
a1/b2c3d4... # could be a blob (raw file content)
ff/0011aa... # could be a cache entry (JSON descriptor)
cd/ef5678... # could be another blob
Cache entries and blobs share the same object store. Cache entries are addressed by cache key hash; blobs are addressed by content hash.
Cache keys
The cache key identifies a product. It is computed as:
hash(processor_name, config_hash, input_content_hash)
Where:
processor_name— the processor type (e.g.,pandoc,ruff)config_hash— hash of the processor configuration (compiler flags, args, etc.)input_content_hash— combined SHA-256 hash of all input file contents
The key is content-addressed: it depends on what the inputs contain, not what they’re named. Renaming a file without changing its content produces the same cache key.
Multi-format processors
For processors that produce multiple output formats from the same input (e.g., pandoc producing PDF, HTML, and DOCX), each format is a separate product with a separate cache key. The output format is part of the config hash, so each format gets its own key naturally.
Output depends on input name
Most processors produce output that depends only on input content. However, some processors embed the input filename in the output (e.g., a // Generated from foo.c header). For these processors, the output_depends_on_input_name property is set to true, and the input file path is included in the cache key:
hash(processor_name, config_hash, input_content_hash, input_path)
Flows
Lookup
- Compute the cache key from processor name + config + input contents
- Look up the object at that key in the object store
- If not found: cache miss, product must be built
- If found: read the descriptor, act based on type
Cache (after successful build)
Checker:
- Store a
{"type": "marker"}entry at the cache key
Generator (single output):
- Store the output file content as a content-addressed blob
- Store a
{"type": "blob", "checksum": "..."}entry at the cache key
Creator (multiple outputs):
- Walk all output directories and files
- Store each file as a content-addressed blob
- Build the tree entries:
[{"path": "...", "checksum": "...", "mode": ...}, ...] - Store a
{"type": "tree", "entries": [...]}entry at the cache key
Restore
Checker: Nothing to restore. Cache entry exists = check passed.
Generator:
- Read the cache entry, get the blob checksum
- Hardlink or copy the blob to the output path
Creator:
- Read the cache entry, get the tree entries
- For each
(path, checksum, mode): restore the blob to the path, set permissions
Skip
If the cache entry exists AND all output files are present on disk, no work is needed.
Rebuild classification
| Classification | Condition | Action |
|---|---|---|
| Skip | Cache key found AND all outputs exist on disk | No work needed |
| Restore | Cache key found BUT some outputs are missing | Restore from object store |
| Build | No cache entry for this key | Execute the processor |
Because the cache key incorporates input content, a changed input produces a different key. There’s no “stale entry” — either the key exists or it doesn’t.
Config-aware caching
Processor configuration is hashed into cache keys. Changing a config value triggers rebuilds even if source files haven’t changed.
Cache restoration methods
| Method | Behavior | Best for |
|---|---|---|
hardlink | Links output to cached blob (same inode, read-only) | Local development (fast, no disk space) |
copy | Copies cached blob to output path (writable) | CI runners, cross-filesystem setups |
auto (default) | Uses copy when CI=true, hardlink otherwise | Most setups |
Hardlinks work because blob objects contain raw file content (not wrapped in a descriptor). Only cache entries (which point to blobs) contain JSON metadata.
Cache commands
| Command | Description |
|---|---|
rsconstruct cache size | Show cache size and object count |
rsconstruct cache list | List all cache entries as JSON |
rsconstruct cache stats | Show per-processor cache statistics |
rsconstruct cache trim | Remove unreferenced objects |
rsconstruct cache clear | Delete the entire cache |
Clean vs Clear
rsconstruct clean removes build outputs but preserves the cache:
- Generators: Output files deleted. Next build restores via hardlink/copy.
- Checkers: Nothing to delete. Next build skips.
- Creators: Output directories deleted. Next build restores from tree.
rsconstruct cache clear wipes everything — descriptors and blobs. A cleared cache means “forget everything, rebuild from scratch.” The entire .rsconstruct/ directory is removed. If only blobs were cleared but descriptors survived, the cache would think outputs are available but fail to restore them. Clearing both together avoids this inconsistency.
Incremental rebuild after partial failure
Each product is cached independently after successful execution. If a build fails partway through, the next run only rebuilds products without valid cache entries.
Remote caching
See Remote Caching for sharing cache between machines and CI.
Checksum Cache
RSConstruct uses a centralized checksum system (src/checksum.rs) for all file hashing. It has two layers of caching to avoid redundant I/O and computation.
Architecture
All file checksum operations go through a single entry point: checksum::file_checksum(path). This function never computes the same hash twice.
Layer 1: In-memory cache (per build run)
A global HashMap<PathBuf, String> stores checksums computed during the current build. When a file is checksummed for the first time, the result is cached. Any subsequent request for the same file returns the cached value without reading the file again.
This handles the common case where the same file appears as an input to multiple products (e.g., a shared header file), or when the checksum is needed both for classification (skip/restore/build) and for cache storage.
The in-memory cache lives for the duration of the process and is not persisted.
Layer 2: Mtime database (across builds)
A persistent redb database at .rsconstruct/mtime.redb maps file paths to (mtime, checksum) pairs. Before reading a file to compute its checksum, the system checks:
- Has this file been checksummed in a previous build?
- Has the file’s modification time changed since then?
If the mtime matches, the cached checksum is returned without reading the file. This avoids I/O for files that haven’t been modified between builds — the common case in incremental builds where most files are unchanged.
When the mtime differs (file was modified), the file is read, the new checksum is computed, and both the in-memory cache and the mtime database are updated.
Dirty mtime entries are flushed to the database in a single batch transaction at the end of each checksum computation pass, minimizing database writes.
Why two layers
| Layer | Scope | Avoids | Cost |
|---|---|---|---|
| In-memory cache | Single build run | Re-reading + re-hashing the same file | HashMap lookup |
| Mtime database | Across builds | Reading unchanged files from disk | stat() + DB lookup |
For the first build, every file must be read and hashed. The mtime database is populated as a side effect. On subsequent builds, most files are unchanged — the mtime check skips reading them entirely, and the in-memory cache prevents redundant lookups within the run.
Configuration
The persistent mtime database can be disabled via rsconstruct.toml:
[cache]
mtime_check = false
Or via the command-line flag:
rsconstruct build --no-mtime-cache
When disabled, every file is read and hashed on every build. The in-memory cache still prevents redundant reads within a single run, but there is no cross-build benefit.
When to disable: In CI/CD environments with a fresh checkout, the mtime database has nothing cached from previous builds and just adds write overhead. The in-memory cache is sufficient. Use --no-mtime-cache (or mtime_check = false in config) to skip the database entirely.
The rsconstruct status command also disables mtime checking internally to ensure accurate classification.
Database location
The mtime database is stored at .rsconstruct/mtime.redb, separate from the build cache (objects/ and descriptors/) and the config tracking database. This separation means:
rsconstruct cache clearremoves the build cache but preserves the mtime database (the next build will still benefit from mtime-based skipping)- The mtime database can be deleted independently without affecting cached build outputs
Combined input checksum
The combined_input_checksum(inputs) function computes a single hash representing all input files for a product. It:
- Checksums each input file (using the two-layer cache)
- Joins all checksums with
: - Hashes the combined string to produce a fixed-length result
Missing files get a MISSING:<path> sentinel so that different sets of missing files produce different combined checksums.
Dependency Caching
RSConstruct includes a dependency cache that stores source file dependencies (e.g., C/C++ header files) to avoid re-scanning files that haven’t changed. This significantly speeds up the graph-building phase for projects with many source files.
Overview
When processors like cc_single_file discover products, they need to scan source files to find dependencies (header files). This scanning can be slow for large projects. The dependency cache stores the results so subsequent builds can skip the scanning step.
The cache is stored in .rsconstruct/deps.redb using redb, an embedded key-value database.
Cache Structure
Each cache entry consists of:
- Key: Source file path (e.g.,
src/main.c) - Value:
source_checksum— SHA-256 hash of the source file contentdependencies— list of dependency paths (header files)
Cache Lookup Algorithm
When looking up dependencies for a source file:
- Look up the entry by source file path
- If not found → cache miss, scan the file
- If found, compute the current SHA-256 checksum of the source file
- Compare with the stored checksum:
- If different → cache miss (file changed), re-scan
- If same → verify all cached dependencies still exist
- If any dependency file is missing → cache miss, re-scan
- Otherwise → cache hit, return cached dependencies
Why Path as Key (Not Checksum)?
An alternative design would use the source file’s checksum as the cache key instead of its path. This seems appealing because you could look up dependencies directly by content hash. However, this approach has significant drawbacks:
Problems with Checksum as Key
-
Mandatory upfront computation: With checksum as key, you must compute the SHA-256 hash of every source file before you can even check the cache. This means reading every file on every build, even when nothing has changed.
With path as key, you do a fast O(1) lookup first. Only if there’s a cache hit do you compute the checksum to validate freshness.
-
Orphaned entries accumulate: When a file changes, its old checksum entry becomes orphaned garbage. You’d need periodic garbage collection to clean up stale entries.
With path as key, the entry is naturally updated in place when the file changes.
-
No actual benefit: The checksum is still needed for validation regardless of the key choice. Using it as the key just moves when you compute it, without reducing total work.
Current Design
The current design is optimal:
Path (key) → O(1) lookup → Checksum validation (only on hit)
This minimizes work in the common case where files haven’t changed.
Cache Statistics
During graph construction, RSConstruct displays cache statistics:
[cc_single_file] Dependency cache: 42 hits, 3 recalculated
This shows how many source files had their dependencies retrieved from cache (hits) versus re-scanned (recalculated).
Viewing Dependencies
Use the rsconstruct deps command to view the dependencies stored in the cache:
rsconstruct deps all # Show all cached dependencies
rsconstruct deps for src/main.c # Show dependencies for a specific file
rsconstruct deps for src/a.c src/b.c # Show dependencies for multiple files
rsconstruct deps clean # Clear the dependency cache
Example output:
src/main.c: (no dependencies)
src/test.c:
src/utils.h
src/config.h
The rsconstruct deps command reads directly from the dependency cache without building the graph. If the cache is empty (e.g., after rsconstruct deps clean or on a fresh checkout), run a build first to populate it.
This is useful for debugging rebuild behavior or understanding the include structure of your project.
Cache Invalidation
The cache automatically invalidates entries when:
- The source file content changes (checksum mismatch)
- Any cached dependency file no longer exists
You can manually clear the entire dependency cache by removing the .rsconstruct/deps.redb file, or by running rsconstruct clean all which removes the entire .rsconstruct/ directory.
Processors Using Dependency Caching
Currently, the following processors use the dependency cache:
- cc_single_file — caches C/C++ header dependencies discovered by the include scanner
Implementation
The dependency cache is implemented in src/deps_cache.rs:
#![allow(unused)]
fn main() {
pub struct DepsCache {
db: redb::Database,
stats: DepsCacheStats,
}
impl DepsCache {
pub fn open() -> Result<Self>;
pub fn get(&mut self, source: &Path) -> Option<Vec<PathBuf>>;
pub fn set(&self, source: &Path, dependencies: &[PathBuf]) -> Result<()>;
pub fn flush(&self) -> Result<()>;
pub fn stats(&self) -> &DepsCacheStats;
}
}
The cache is opened once per processor discovery phase, queried for each source file, and flushed to disk at the end.
Processor Versioning and Cache Invalidation
When a processor’s implementation changes in a way that produces different output for the same input, every cached entry it produced becomes potentially stale. This chapter documents the problem, the design alternatives we considered, and the chosen approach.
The problem
rsconstruct’s cache is content-addressed on a key derived from:
- Primary input file checksums
dep_inputs/dep_autofile checksumsoutput_config_hash(the processor’s relevant config fields)- Tool version hash (optional — e.g.
ruff --versionoutput)
Crucially absent: the implementation of the processor itself.
Consider: a user upgrades rsconstruct to a version where the ruff wrapper now passes a new flag by default. Inputs haven’t changed. Config hasn’t changed. Ruff’s binary version hasn’t changed. But the output is different — the new flag changes behavior.
rsconstruct sees a cache hit on the old descriptor and restores the stale result. The user gets incorrect output from “fresh” caches.
Design alternatives considered
Option A: Hash the binary at startup
Compute a SHA of the rsconstruct binary itself at program start. Mix that hash into every product’s cache key.
How it works: Any change to any part of rsconstruct — processors, core executor, cache code, even comments — invalidates every cache entry.
Pros:
- Trivially correct. If any code changed, caches are invalidated.
- Zero developer action.
- No risk of forgotten invalidation.
Cons:
- Massively over-invalidates. Fixing a typo in a docstring or reformatting the
cleancommand wipes every user’s cache across every processor. - Makes iterating on rsconstruct itself painful — developers constantly rebuild everything.
- Version bumps of unrelated dependencies (regex bumps, anyhow bumps) change the binary and also invalidate.
Option B: Per-file source hash (automatic)
build.rs hashes each processor’s .rs file at compile time. The hash is embedded as a &'static str into that processor’s plugin entry. Cache key includes this hash.
How it works: Modify src/processors/checkers/ruff.rs, next build picks up a new hash, ruff’s caches invalidate. Other processors are unaffected.
Pros:
- Zero developer action — hashes are automatic.
- More precise than Option A — only the changed processor invalidates.
- Never forget to bump.
Cons:
- Too sensitive. Whitespace changes, comment fixes, rustfmt reformats, renames of private helpers — all invalidate the cache even though behavior is identical.
- Doesn’t catch indirect changes. If a processor calls shared helpers in
processors/mod.rsand those change, the processor’s file hash hasn’t changed but its behavior has. We need to hash transitive dependencies, and Rust doesn’t give us an easy way. - Non-deterministic sources of churn: different rustfmt versions produce different hashes for the same intent, CI vs. local editor differences cause spurious invalidation.
- Signal dilution: users stop paying attention to “this rebuilt” because it happens even for cosmetic changes. The signal loses meaning.
Option C: Whole src/processors/ subtree hash
Hash the entire processors directory at compile time. Any change to anything under src/processors/ invalidates every processor’s cache.
How it works: Middle ground between A and B.
Pros:
- Catches shared-helper changes automatically (since helpers are in the same subtree).
- Less aggressive than A — core-executor tweaks don’t invalidate.
Cons:
- Still over-invalidates — a fix to processor X wipes processor Y’s cache.
- Still vulnerable to formatting/comment churn.
Option D: Explicit per-processor version (manual)
Each processor declares a version: u32 in its plugin entry. The developer bumps it when making a behavior-changing modification. Cache key includes the version.
How it works:
#![allow(unused)]
fn main() {
inventory::submit! { ProcessorPlugin {
name: "ruff",
version: 1, // bump when behavior changes
...
}}
}
Commit Processor ruff: change default flags becomes the same commit as version: 1 → version: 2.
Pros:
- Precise. Only bumps when the developer decides behavior actually changed.
- Stable. Reformats, comment edits, renames do not invalidate caches.
- Auditable. Every version bump is visible in git history as a deliberate one-line change with its own rationale.
- Cross-platform deterministic — a number, not a hash sensitive to file encoding.
- Signal stays meaningful — users see a rebuild only when something actually changed.
Cons:
- Relies on developer discipline. Forgetting to bump after a behavior change leaves stale caches surviving — a silent correctness bug, arguably worse than no invalidation (because it creates a false sense of safety).
- Requires a documented bump rule so the convention is followed.
- Can be mitigated by code review (diffs show version bumps) and optional CI checks (warn when a processor file changes without a version bump).
Option E: Hybrid — manual version OR automatic hash, whichever is larger
Both fields exist. The cache key includes max(manual version, auto hash). Belt-and-suspenders.
Pros: Catches both forgotten bumps and behavior changes.
Cons: Complexity. Two systems doing nearly the same thing. Users don’t know which one is “the” trigger. Debugging cache misses becomes harder. Loses the “explicit and predictable” property of Option D.
Decision: Option D (explicit per-processor version)
For a build system that cares about cache correctness, deliberate is better than automatic:
- Cache stability is a feature. Users expect their caches to survive a refactor, a
cargo fmt, a whitespace cleanup. An automatic hash violates this expectation constantly. - A version bump documents intent.
git blameon theversion:line shows why behavior changed. An auto hash leaves no such record. - The discipline cost is low. Each behavior-changing commit already requires care — adding a one-line version bump to that care is trivial. Forgetting to bump is caught by code review, same as forgetting a changelog entry or a test.
- The discipline failure mode is recoverable. Worst case: a version bump is forgotten, users report stale caches, we bump the version retroactively in the next release. This is better than the Option B failure mode (constant spurious invalidation drives users to distrust the system).
The bump rule
Bump a processor’s version when ANY of:
- The processor would produce different output files for the same inputs.
- The processor would include different content in an output file for the same inputs.
- The processor changes which inputs are discovered (e.g. a new glob pattern, a changed default).
- The processor changes which paths are declared as outputs.
- The processor’s interpretation of a config field changes (e.g. what a flag means, how a default is resolved).
Do NOT bump for:
- Refactors with identical behavior.
- Comment / docstring changes.
- Reformatting.
- Renaming of internal helpers.
- Performance improvements that don’t change output.
- Bug fixes in error messages (but DO bump if the fix changes which inputs succeed/fail).
When in doubt, bump. A bump is cheap (rebuild all products of one processor once); a missed bump is a correctness bug.
Implementation outline
- Add a required
version: u32field toProcessorPlugin(no default — every processor must declare it). - Include the version in the cache key via
output_config_hashordescriptor_key. - Initialize all existing processors to
version: 1. - Document the bump rule in a prominent comment near the field definition.
- (Optional, future) CI check: if a processor file’s git diff touches logic but not the
version:line, post a warning comment on the PR.
Migration
On the first release after this change ships, every existing cache entry is invalidated (the cache key schema changed). This is a one-time cost, same as any cache-key schema evolution. Users will see a full rebuild once, then cache behavior resumes normally.
See also
- Cache System — how the cache is organized and keyed
- Checksum Cache — the mtime-based content checksum layer
- Processor Contract — the broader contract each processor must uphold
Cross-Processor Dependencies
This chapter discusses the problem of one processor’s output being consumed as input by another processor, and the design options for solving it.
The Problem
Consider a template that generates a Python file:
tera.templates/config.py.tera → (template processor) → config.py
Ideally, ruff should then lint the generated config.py. Or a template might
generate a C++ source file that needs to be compiled by cc_single_file and
linted by cppcheck. Chains can be arbitrarily deep:
template → generates foo.sh → shellcheck lints foo.sh
template → generates bar.c → cc_single_file compiles bar.c → cppcheck lints bar.c
Currently this does not work. Each processor discovers its inputs by querying
the FileIndex, which is built once at startup by scanning the filesystem.
Files that do not exist yet (because they will be produced by another processor)
are invisible to downstream processors. No product is created for them, and no
dependency edge is formed.
Why It Breaks
The build pipeline today is:
- Walk the filesystem once to build
FileIndex - Each processor runs
discover()against that index resolve_dependencies()matches product inputs to product outputs by path- Topological sort and execution
Step 3 already handles cross-processor edges correctly: if product A declares
output foo.py and product B declares input foo.py, a dependency edge from
A to B is created automatically. The problem is that step 2 never creates
product B in the first place, because foo.py is not in the FileIndex.
How Other Build Systems Handle This
Bazel
Bazel uses BUILD files where rules explicitly declare their inputs and outputs.
Dependencies are specified by label references, not by filesystem scanning.
However, Bazel does use glob() to discover source files during its loading
phase. The key insight is that during the analysis phase, both source files
(from globs) and generated files (from rule declarations) are visible in a
unified view. A rule’s declared outputs are known before any action executes.
Buck2
Buck2 takes a similar approach with a single unified dependency graph (no
separate phases). Rules call declare_output() to create artifact references
and return them via providers. Downstream rules receive these references through
their declared dependencies. For cases where the dependency structure is not
known statically, Buck2 provides dynamic_output — a rule can read an artifact
at build time to discover additional dependencies.
Common Pattern
In both systems, the core principle is the same: a rule’s declared outputs are visible to the dependency resolver before execution begins. The dependency graph is fully resolved at analysis time.
Proposed Solutions
A. Multi-Pass Discovery (Iterative Build-Scan Loop)
Run discovery, build what is ready, re-scan the filesystem, discover again. Repeat until nothing new is found.
- Pro: Simple mental model, handles arbitrary chain depth
- Con: Slow (re-scans filesystem each pass), hard to detect infinite loops, execution is interleaved with discovery
B. Virtual Files from Declared Outputs (Two-Pass)
After the first discovery pass, collect all declared outputs from the graph and inject them as “virtual files” visible to processors. Run discovery a second time so downstream processors can find the generated files.
- Pro: No filesystem re-scan, single build execution phase, deterministic
- Con: Limited to chains of depth 1 (producer → consumer). A three-step chain (template → compile → lint) would require three passes, making the fixed two-pass design insufficient.
C. Fixed-Point Discovery Loop
Generalization of Approach B. Run discovery in a loop: after each pass, collect newly declared outputs and feed them back as known files for the next pass. Stop when a full pass adds no new products. Add a maximum iteration limit to catch cycles.
known_files = FileIndex (real files on disk)
loop {
run discover() for all processors, with known_files visible
new_outputs = outputs declared in this pass that were not in known_files
if new_outputs is empty → break
known_files = known_files + new_outputs
}
resolve_dependencies()
execute()
A chain of depth N requires N iterations. Most projects would converge in 1-2 iterations.
- Pro: Fully general, handles arbitrary chain depth, no filesystem re-scan, deterministic, path-based matching (no reliance on file extensions)
- Con: Processors must be able to discover products for files that do not exist on disk yet (they only know the path). This works for stub-based processors and compilers but might be an issue for processors that inspect file contents during discovery.
D. Explicit Cross-Processor Wiring in Config
Let users declare chains in rsconstruct.toml:
[[pipeline]]
from = "template"
to = "ruff"
rsconstruct then knows that template outputs matching ruff’s scan configuration should become ruff inputs.
- Pro: Explicit, no magic, user controls what gets chained
- Con: More configuration burden, loses the “convention over configuration” philosophy
E. Make out/ Visible to FileIndex
The simplest mechanical fix: stop excluding out/ from the FileIndex. Since
.gitignore contains /out/, the ignore crate skips it. This could be
overridden in the WalkBuilder configuration.
- Pro: Minimal code change, works on subsequent builds (files already exist from previous build)
- Con: Does not work on the first clean build (files do not exist yet). Processors would also see stale outputs from deleted processors, and stub files from other processors (though extension filtering would exclude most of these).
F. Two-Phase Processor Trait (Declarative Forward Tracing)
Split the ProductDiscovery trait so that each processor can declare what
output paths it would produce for a given input path, without performing full
discovery:
#![allow(unused)]
fn main() {
trait ProductDiscovery {
/// Given an input path, return the output paths this processor would
/// produce. Called even for files that don't exist on disk yet.
fn would_produce(&self, input_path: &Path) -> Vec<PathBuf>;
/// Full discovery (as today)
fn discover(&self, graph: &mut BuildGraph, file_index: &FileIndex) -> Result<()>;
// ...
}
}
The build system first runs discover() on all processors to get the initial
set of products and their outputs. Then, for each declared output, it calls
would_produce() on every other processor to trace the chain forward. This
repeats transitively until no new outputs are produced. Finally, discover()
runs once more with the complete set of known paths (real + virtual).
Unlike Approach C, this does not require a loop over full discovery passes. The chain is traced declaratively by asking each processor “if this file existed, what would you produce from it?” — a lightweight query that does not modify the graph.
- Pro: Single discovery pass plus lightweight forward tracing. No loop, no convergence check, no iteration limit. Each processor defines its output naming convention in one place. The full transitive closure of outputs is known before the main discovery runs.
- Con: Adds a method to the
ProductDiscoverytrait that every processor must implement. Some processors have complex output path logic (e.g.,cc_single_filechanges the extension and directory), sowould_produce()must replicate that logic — meaning the output path computation exists in two places (inwould_produce()and indiscover()). Keeping these in sync is a maintenance risk.
G. Hybrid: Visible out/ + Fixed-Point Discovery
Combine Approach E (make out/ visible) with Approach C (fixed-point loop) or
Approach F (forward tracing).
On subsequent builds, existing files in out/ are already in the index. On
clean builds, the fixed-point loop discovers them from declared outputs.
- Pro: Most robust — works for both clean and incremental builds
- Con: Combines complexity of two approaches, risk of discovering stale outputs
Recommendation
Approach C (fixed-point discovery loop) is the most principled solution. It is fully general, handles arbitrary chain depth, requires no configuration, and matches the core insight from Bazel and Buck2: declared outputs should be visible during dependency resolution before execution begins.
The main implementation requirement is extending the FileIndex (or creating a
wrapper) to accept “virtual” entries for paths that are declared as outputs but
do not yet exist on disk. Processors already declare their outputs during
discover(), so the information needed to populate these virtual entries is
already available.
Current Status
Cross-processor dependencies are implemented using Approach C (fixed-point
discovery loop). After each discovery pass, newly declared outputs are injected
as virtual files into the FileIndex. Discovery re-runs with the expanded index
until no new products are found (up to 10 iterations).
Key implementation details:
FileIndex::add_virtual_files()inserts declared output paths into the index so downstream processors can discover them viascan().BuildGraph::add_product()handles re-declarations during multi-pass discovery (see below).- The loop runs in all three discovery sites: the main build graph builder,
build_graph_filtered, and the deps builder. --phasesoutput shows per-pass statistics when multiple passes are needed.- Most projects converge in 1 pass (no cross-processor chains). Projects with generator → checker chains converge in 2 passes.
Deduplication during multi-pass discovery
When processors re-run on subsequent passes, they may try to add products that
already exist. add_product() detects this via two separate dedup paths,
depending on whether the product declares outputs:
Products with outputs (generators)
Dedup is keyed on output paths. When a product with the same outputs is re-declared by the same processor:
-
Identical re-declaration — Same inputs. The product is silently skipped.
-
Expanded inputs — The new inputs are a superset of the existing inputs. This happens when a processor like
tagscollects all matching files into a single product. On pass 2, virtual files from generator outputs are now in theFileIndex, sotagsdiscovers the same product with additional inputs. The existing product’s inputs are updated to the expanded set, and theinput_to_productsindex is updated accordingly.
Both cases account for instance name remapping: a product may have been
remapped from cc_single_file to cc_single_file.clang after pass 1, but
discover() still passes the type name cc_single_file on pass 2. The
dedup check accepts processor names where one is a qualified instance of the
other (e.g., cc_single_file matches cc_single_file.clang).
Genuinely conflicting products — different processors (or the same processor
with different inputs that are not a superset) declaring the same output —
still produce an Output conflict error.
Products without outputs (checkers, explicit processors with output_dirs)
Products with no declared output files (e.g., checkers, or explicit processors
that only declare output_dirs) cannot be deduped by output path. Instead,
they are deduped by the tuple (processor_name, primary_input, variant) via
the checker_dedup index.
This path also supports expanded inputs. When a later pass re-declares the
same product with a superset of inputs, the existing product’s inputs are
updated. This is critical for processors like explicit that use input_globs:
on pass 0, the globs may match nothing (the target files don’t exist yet); on
pass 1, virtual files from upstream generators are available and the globs
resolve to additional inputs. Without the input update, the product would be
frozen with its pass-0 inputs, no dependency edges would be created to the
upstream producers, and the product would execute too early (before its actual
inputs exist).
Shared Output Directory
Multiple processors can write into the same directory — a website _site/, a dist/, a build/ folder. This document explains how rsconstruct keeps each processor’s cache correct when they share an output directory, and the exact rules that make it work.
The scenario
A common case:
- mkdocs (a Creator) builds a whole site. It produces many files under
_site/and declares the directory as itsoutput_dir. It cannot enumerate individual outputs in advance. - pandoc (a Generator / Explicit) converts one specific markdown file into
_site/about.html. It declares that file explicitly as itsoutput_files.
Both contribute to the same directory. A website IS a single folder by design.
[processor.creator.mkdocs]
command = "mkdocs build --site-dir _site"
output_dirs = ["_site"]
[processor.explicit.pandoc]
command = "./pandoc-page.sh"
inputs = ["about.md"]
output_files = ["_site/about.html"]
The problem
Naive implementations break in at least three places:
- Over-claiming at cache store time. If mkdocs’s cache entry walks
_site/and records every file, it will wrongly claimabout.htmlas its own. On cache restore, pandoc’s file gets restored from mkdocs’s cache — with whatever content mkdocs last saw there — even if pandoc hasn’t run. - Clobbering at build time. If mkdocs wipes
_site/before running (so stale outputs from a previous build don’t linger), it will also delete pandoc’sabout.htmlwhenever mkdocs runs after pandoc. - Clobbering at restore time. If restoring mkdocs’s cache wipes
_site/before writing cached files, it will again destroy pandoc’s output.
Each problem leads to silent cache corruption: stale content appears to be fresh, or recently-built files vanish.
Ownership rule
Every declared output path has exactly one owner — the single product that lists it in
outputs,output_files, or produces it as a named product output.A directory declared as
output_diris not an ownership claim on the whole subtree. The Creator only owns the files it itself produces that no other product has declared.
This is enforced by a single graph query, BuildGraph::path_owner(path) -> Option<usize>, which returns the id of the unique product that declares path as one of its outputs (or None if nobody does).
Pseudocode:
path_owner(path):
for each product P in graph:
if path in P.outputs:
return P.id
return None
A declared output path has at most one owner by construction — if two products declare the same literal output, that is detected as an output conflict at graph-build time and the build aborts.
How each of the three hazards is handled
1. Over-claiming at cache store time
When a Creator’s tree descriptor is being built in ObjectStore::store_tree_descriptor, the walker visits every file under each output_dir. For each file, it asks the graph: “Is this path owned by a different product?”
is_foreign(path) = graph.path_owner(path) is Some(owner) and owner != my_product_id
If is_foreign(path) is true, the file is skipped — it does not appear as a tree entry. The Creator’s cache then contains only files the Creator actually created and that nobody else has laid claim to.
When pandoc writes _site/about.html and mkdocs later caches _site/, mkdocs’s tree will not contain about.html because path_owner("_site/about.html") == pandoc.id != mkdocs.id.
2. Clobbering at build time
Before a product’s command runs, remove_stale_outputs removes stale outputs so the command can rewrite them fresh (important when a cache restore left read-only hardlinks in place).
The rule for Creators:
- Do NOT wipe
output_dirwholesale. - Read the previous tree descriptor from the object store.
- Remove only the files recorded in that previous tree.
- Re-create the
output_dir(so the command can assume it exists). - Leave any file not in the previous tree alone — it belongs to somebody else.
Pseudocode:
remove_stale_outputs(product, input_checksum):
if product has output_dirs:
previous = object_store.previous_tree_paths(descriptor_key(product, input_checksum))
for file in previous:
if file exists: remove it
for dir in product.output_dirs:
create dir if missing
for file in product.outputs:
if file exists: remove it
Because the previous tree only ever contained paths the Creator owned, this removal cannot touch files owned by other processors.
3. Clobbering at restore time
Cache restore for a tree descriptor iterates entries and writes each one in place. It never calls remove_dir_all on the output_dir. If a file already exists with the correct checksum, the restore skips it (saving I/O).
When mkdocs restores its tree:
_site/index.htmland_site/assets/style.cssare written from the object store._site/about.htmlis NOT in mkdocs’s tree, so it is neither written nor removed.- If pandoc has also restored, pandoc’s blob descriptor wrote
_site/about.htmlseparately.
The two restores compose correctly regardless of order.
Invariants
The system relies on these invariants; each is enforced in code:
| # | Invariant | Where enforced |
|---|---|---|
| 1 | Every declared output path has at most one owner. | add_product / graph validation (output conflict check) |
| 2 | A Creator’s tree descriptor contains only paths not owned by any other product. | store_tree_descriptor with is_foreign predicate |
| 3 | Pre-run cleanup removes only files the Creator previously owned. | remove_stale_outputs reads previous_tree_paths |
| 4 | Cache restore never deletes files it did not cache. | restore_tree_descriptor writes in place; no remove_dir_all |
When all four hold, processors can freely share an output directory.
Worked example
Starting from an empty project, both processors are declared as above and both get to run on a fresh build.
First build
- pandoc runs first.
remove_stale_outputs: pandoc has nooutput_dirs; removes_site/about.htmlif it exists (it doesn’t). No-op.- Runs
./pandoc-page.sh, which creates_site/about.html. - Caches a blob descriptor for
_site/about.html.
- mkdocs runs next.
remove_stale_outputs: mkdocs hasoutput_dirs; looks up its previous tree (none — first build). Creates_site/to ensure it exists.- Runs
mkdocs build, which writes_site/index.html,_site/assets/style.css, and may (harmlessly) touch_site/about.html. - Caches a tree descriptor. The walker skips
_site/about.htmlbecausepath_ownersays pandoc owns it. Tree =[index.html, assets/style.css].
Final state on disk: index.html, assets/style.css, about.html. All three files exist with correct content.
Incremental build, no changes
- pandoc: input checksum matches; descriptor already exists; skipped.
- mkdocs: input checksum matches; descriptor already exists; skipped.
Clean outputs + rebuild
rsconstruct clean outputsdeletes_site/entirely.- Next build:
- pandoc’s input checksum matches its cached descriptor → restore blob → writes
_site/about.html. - mkdocs’s input checksum matches its cached descriptor → restore tree → writes only the files in the tree (
index.html,assets/style.css), leavesabout.htmlalone.
- pandoc’s input checksum matches its cached descriptor → restore blob → writes
Final state is the same as after the first build, without either tool having actually run.
Building only the Creator (-p creator.mkdocs)
- pandoc is not in the run set;
_site/about.htmlstays wherever it was (absent if cleaned, present otherwise). - mkdocs runs or restores its tree.
If _site/ was clean, about.html remains absent — which is correct, because the Creator does not claim to produce it. The regression test creator_tree_does_not_include_foreign_outputs verifies exactly this.
Non-goals
- Runtime conflict detection for paths the Creator actually wrote but didn’t declare. If a Creator happens to write a file that another Generator also declares, the declared owner wins; the Creator’s tree simply won’t include that file. We do not error on this.
- Ordering constraints. rsconstruct does not enforce “Generators run before Creator” or vice versa. The snapshot/walk is done after each product finishes, and
path_owneris a static graph query independent of run order. - Partial-directory caching like git trees with subtrees. The tree descriptor is a flat list of
(path, checksum)entries, which is enough for this use case.
Quick reference for processor authors
If you are writing a new processor:
- Generator / Explicit: declare every output file in
output_files. rsconstruct keeps each of your files safe from Creators that share the directory. - Creator: declare the shared directory in
output_dirs. Do NOT assume the directory is empty when your command runs — other processors may have already contributed files to it. Your command should overwrite only what it produces; it should not wipe the directory. - Conflict: never declare the same path as an output in two different products. That is a graph-build-time error regardless of directory sharing.
Processor Ordering
When two processors touch the same files or cooperate on a shared workspace, the question of “which runs first?” inevitably comes up. This chapter explains how rsconstruct answers that question today, how other build systems approach it, the dilemmas that show up in practice, and why rsconstruct has deliberately avoided adding explicit ordering knobs so far.
How rsconstruct orders today
rsconstruct has no explicit cross-processor ordering configuration. Ordering is derived entirely from the data-flow graph:
- Each product (a unit of work from a processor) declares
inputsandoutputs. - If product A’s
inputscontains a path that product B’soutputsalso contains, A depends on B — B runs first. - Products with no such relationship are considered independent and may run in parallel (within the same topological level).
That’s the whole mechanism. The BuildGraph performs a topological sort on this implicit graph and the executor processes levels in order. See Cross-Processor Dependencies for the data-flow story.
There is no depends_on, mustRunAfter, before, after, priority, or stage field anywhere in rsconstruct.toml. If two processors write into the same directory without any file dep between them, their order is undefined and may vary between runs.
How other tools handle it
Bazel, Buck2
No explicit ordering. Rules declare srcs, deps, and outs. The scheduler orders actions strictly by the DAG of declared inputs/outputs. Hermeticity is a first-class value — if you need something to run before something else, you model it as a data dependency. If a rule B needs rule A’s side effect but not its output, you fabricate a marker file: A outputs a.done, B takes a.done as an input.
Bazel’s design intent: if you need ordering without data flow, you’re modeling the problem wrong. The graph should tell the truth about what depends on what.
Make, Ninja
Data-flow ordering via rules (foo.o: bar.h). Ninja adds order-only dependencies — the || separator in build.ninja. An order-only dep means “run A before B” without “rebuild B when A changes”. This is useful for things like “create out/ before any rule tries to write into it”. It’s the minimum viable ordering primitive: pure ordering, no rebuild semantics.
Gradle
Has explicit ordering primitives, three of them:
dependsOn— real dependency: running B automatically runs A first (even if A would otherwise be skipped).mustRunAfter— ordering constraint: if both A and B are in the scheduled set, A runs first; but running B does NOT pull A in.shouldRunAfter— soft ordering hint: honored when possible, may be violated to enable parallelism.
Gradle’s ecosystem (Android, JVM tooling, packaging/signing pipelines) has more real-world “unrelated tasks that still need ordering” cases — e.g., signing must happen after packaging even though they don’t share a file output. The three-level hierarchy lets users pick the right strength.
CMake
add_dependencies(targetA targetB) enforces ordering at the target level, beyond file-level rules. Used mostly for custom targets that don’t produce tracked output files — the bridge when file-based ordering isn’t sufficient.
Cargo, SBT
No explicit cross-crate ordering. Everything flows from [dependencies] / library deps → data flow → topological sort. Same posture as Bazel.
Summary table
| Tool | Explicit ordering knobs | Philosophy |
|---|---|---|
| Make / Ninja | Order-only deps (||) | Bridge when file deps aren’t enough |
| Bazel, Buck2 | None | Hermeticity; all ordering comes from data flow |
| Cargo, SBT | None | Same as Bazel |
| Gradle | dependsOn, mustRunAfter, shouldRunAfter | Real-world tasks have non-data ordering needs |
| CMake | add_dependencies | Bridge for “phantom” custom targets |
| rsconstruct | None (currently) | Same as Bazel |
The dilemmas
Adding explicit ordering feels useful but carries real risks. Here are the tradeoffs.
Dilemma 1: does ordering imply rebuild?
Say [processor.b] after = ["a"]. If A’s output changes, should B rebuild?
- If yes,
afteris justdependsOn— which we already have through data flow. It’s redundant. - If no,
afteris pure ordering (mustRunAfter). But then it silently lies about the true dependency graph: a user might addafter = ["a"]because they “know” B consumes A’s side effect, but rsconstruct won’t invalidate B’s cache when A changes. Stale caches follow.
Gradle copes because it has three flavors. Adding one flavor is usually wrong; adding three is complexity creep.
Dilemma 2: declared vs. inferred
rsconstruct already infers ordering from inputs/outputs. Adding another channel means:
- Two sources of truth for the dependency graph.
- Debugging “why did B run after A?” now requires checking both the data flow AND the explicit config.
- Mistakes compound: a user adds
after = ["a"]but forgets that they ALSO removed the data dep; now B runs after A but doesn’t actually consume anything from it.
Dilemma 3: encourages side-effects
If ordering knobs exist, they become the path of least resistance for modeling side effects:
“My script also writes to
/tmp/cache_seed.json, just declareafterand it’ll work.”
Side-effectful processors are an anti-pattern in any incremental build system — the cache can’t know when they changed, when to rerun them, or what invalidates them. Every ordering primitive that doesn’t touch the cache makes side effects easier to introduce.
Dilemma 4: the “fix-up pass” case
The one case where data flow struggles: a processor that runs after everything else has written to a shared directory and modifies the result. Examples:
- Minification: take everything in
dist/and minify it after all generators have produced their outputs. - Post-processing: add cache-busting hashes to filenames, rewrite links, compress.
In Bazel, you model this as a rule with srcs = glob(["dist/**"]). But with lazy generators (outputs that didn’t exist when the scan ran), globs can miss things.
Reasonable fixes without adding ordering knobs:
- Have the fix-up processor declare its inputs explicitly as the output files of the generators. Works but requires enumeration.
- Re-scan globs after each dependency level so the fix-up step sees newly-generated files. Correct, but costlier.
- Make the fix-up a Creator with the whole
dist/as itsoutput_dir. Our shared-output-directory logic handles this cleanly (see that chapter), but now the fix-up operates in-place on files owned by others, which touches the “files owned by other products” rule.
None of these is wonderful, but none requires a new ordering primitive.
Dilemma 5: parallelism is already constrained
If ordering becomes a first-class concept, users will sprinkle after = [...] for safety and the scheduler will serialize work that could have run in parallel. Bazel’s aggressive parallelism comes partly from refusing to accept unprincipled ordering constraints.
Why rsconstruct hasn’t added ordering
The posture we’ve picked (for now):
- Data flow is the truth. Every time ordering matters, there is a real data dependency. Expose it as an input/output rather than as a separate ordering rule.
- Shared output directories are handled without ordering. The Shared Output Directory design lets multiple processors contribute to one folder in any order; the cache stays correct per-processor.
- The cost of adding explicit ordering is high: it creates a second channel for dependencies, invites side-effect-oriented thinking, and rarely solves a problem that couldn’t be solved by modeling the data flow properly.
When we would add explicit ordering
If a real use case appears where:
- Data flow genuinely cannot express the dependency (no file is consumed, only a side effect).
- The alternative (adding a marker file or input_glob re-scan) is significantly worse than adding a knob.
- The feature can be specified with clear rebuild semantics (pick one of: forces rerun / does not force rerun; do not leave it ambiguous).
Then the most likely shape is a single after = ["processor_name"] field with Gradle’s mustRunAfter semantics:
- Affects ordering only when both processors are already scheduled.
- Does NOT add a rebuild trigger.
- Does NOT force the referenced processor to run.
This is the smallest, most honest knob. It doesn’t pretend to be a data dependency; it doesn’t change cache invalidation; it only constrains scheduling.
Until that case is concrete, the answer is: model ordering through data flow. The graph should tell the truth.
Alternative: Output Prediction
Another way to close the gap without adding ordering knobs: make opaque Creators (mkdocs, Sphinx, Jekyll) transparent by discovering their outputs in advance.
Instead of the Creator declaring output_dirs = ["_site"] (opaque — “something goes in here”), it would declare (or generate) the exact file list it will produce:
[processor.creator.mkdocs]
command = "mkdocs build --site-dir _site"
predict_command = "./list-mkdocs-outputs.sh" # prints one output path per line
output_dirs = ["_site"]
rsconstruct would run predict_command at graph-build time, turn each printed path into a declared outputs entry, and promote the Creator to a per-file Mass Generator. After that, the entire “how do we order two processors that both write into _site/?” question dissolves — every file has exactly one declared owner, and the normal Generator/data-flow rules apply.
Why this is an alternative to ordering knobs:
- Explicit ordering says “we can’t model this; let the user pin the order manually.”
- Output prediction says “we can model this if we know the outputs; let’s discover them.”
Prediction is the more principled answer — the graph ends up telling the truth about what depends on what — but it is far more expensive to do well (predictor drift, plugin ecosystems, partial-build support, validation). Ordering knobs are cheap but lie about the dependency graph.
The full tradeoff is explored in the Output Prediction chapter. Short version: neither is obviously better; they solve different problems and could coexist.
See also
- Cross-Processor Dependencies — how data-flow dependencies work between processors
- Shared Output Directory — how multiple processors can cooperate on one directory without ordering
- Output Prediction — a different approach that makes opaque Creators transparent
- Design Notes — broader design principles
Output Prediction & MassGenerator
A Creator (mkdocs, Sphinx, Jekyll, Hugo, etc.) declares output_dirs = ["_site"] — “I produce something in here, don’t ask me what until I’ve run.” This chapter specifies a new processor type, MassGenerator, that makes those tools transparent: the tool is asked in advance what it will produce, and each planned file is promoted to a declared product output.
Once outputs are known up front, per-file caching, precise incremental rebuilds, cross-processor dependencies on generated files, and safe output-conflict detection all come for free.
Status
Designed, not yet implemented. This document is the design spec that guides the implementation.
Related designs:
- Shared Output Directory — the fallback mechanism for tools that can’t predict outputs.
- Processor Ordering — the sibling design discussion about explicit ordering knobs.
The core idea
Today we treat tools like mkdocs as a black box:
[processor.creator.mkdocs]
command = "mkdocs build --site-dir _site"
output_dirs = ["_site"] # opaque — we only know the directory
The new approach asks the tool to emit a manifest before running:
[processor.mass_generator.mkdocs]
command = "mkdocs build --site-dir _site"
predict_command = "mkdocs-plan" # prints a JSON manifest on stdout
output_dirs = ["_site"]
rsconstruct invokes predict_command at graph-build time, parses its JSON output, and creates one product per planned file. Each product has its own inputs (taken from the manifest’s sources field) and a single outputs entry (the planned path). From that point on, the product is a regular per-file Generator — caching, dependency tracking, and cross-processor wiring all work uniformly.
Manifest format
predict_command must print a single JSON document to stdout in this shape:
{
"version": 1,
"outputs": [
{
"path": "_site/index.html",
"sources": ["docs/index.md", "templates/default.html", "mysite.toml"]
},
{
"path": "_site/about/index.html",
"sources": ["docs/about.md", "templates/default.html", "mysite.toml"]
},
{
"path": "_site/assets/style.css",
"sources": ["assets/style.scss", "assets/_vars.scss"]
}
]
}
version— integer. Schema version (1 for now). Allows future evolution without breaking existing tools.outputs— array, one entry per file the tool will produce.outputs[].path— output file path relative to the project root. Must fall within one of the processor’soutput_dirs(enforced).outputs[].sources— array of input paths whose changes should trigger rebuilding this output. Used as the product’sinputs, which feed into cache-key computation.
Order within outputs must be deterministic (sorted by path). The sources array should be minimal — only the files whose content genuinely affects this specific output.
Lifecycle
1. Plan phase (at graph-build time)
Once per MassGenerator instance declared in rsconstruct.toml:
- Run
predict_command. Capture stdout and exit status. - Exit status non-zero → fail the graph build with the tool’s stderr in the error message.
- Parse stdout as JSON. Malformed → fail the graph build.
- Reject manifest if any
outputs[].pathfalls outside the declaredoutput_dirs. - For each manifest entry, add one product to the build graph:
inputs= entry’ssourcesoutputs= [entry’spath]processor= this instance’s name
- Cache the manifest itself in the object store, keyed on a hash of
(config + input_checksum_of(source_tree)). Re-planning is skipped when the hash matches.
The plan phase runs BEFORE the existing product-discovery phase, so predicted outputs are known to all downstream processors (linters, compressors, etc.) via the normal file-index/cross-processor-dependency mechanisms.
2. Build phase
When one or more MassGenerator products are dirty:
- rsconstruct groups all dirty products belonging to the same MassGenerator instance into a single execution batch.
- It invokes
commandexactly once per batch (not per product). - The tool produces all its output files in that one invocation.
- Each product caches its own output file as a blob descriptor, independently.
- In strict mode (default): after the tool exits, rsconstruct verifies that every predicted file in the batch was produced and no unexpected files appeared in
output_dirs. A mismatch fails the build. - In loose mode (
--loose-manifestCLI flag): divergence is a warning only.
The “one invocation, many products” idiom is this type’s defining execution shape — distinct from both Generator (one invocation per product) and Creator (one invocation, one product).
3. Restore phase
When all MassGenerator products for an instance are cache-clean:
- Each product is restored from its blob descriptor independently — no tool invocation at all.
- Partial restoration is natural: if 47 of 50 files are clean, only 3 products go through the build phase (which still triggers one tool invocation, but the 47 unchanged files are either untouched on disk or silently overwritten with identical content).
4. Verification (strict mode)
After build:
- Every manifest entry → file exists with the right path.
- Every file in
output_dirs→ appears in the manifest OR belongs to another processor (via the existingpath_ownerquery).
Violations are hard errors; partial output is left on disk for debugging.
Graph shape
With a MassGenerator producing N planned files, the graph looks like this:
source files (markdown, templates, config)
|
| (as inputs to each planned file's product)
v
[product: _site/index.html]
[product: _site/about/index.html]
[product: _site/assets/style.css]
... (N products, all with processor = "mass_generator.mkdocs")
Each product is a first-class citizen in the graph. A downstream linter can depend on _site/index.html like any other generated file.
Execution: one tool invocation for many products
Today’s executor assumes “one product = one invocation of processor.execute(product).” MassGenerator violates that. The cleanest implementation (per the design discussion) uses a two-level graph:
- Phase product (internal, not user-visible): one synthetic product per MassGenerator instance whose
executeis the actual tool invocation. It has no declared outputs; its job is to populate the output_dir. - File products (the N planned files): each depends on the phase product, meaning the tool must have run before any file product can be cached/restored. Each file product’s
executeis a no-op (tool already ran); it just caches its output.
The dependency system then naturally orders: phase product runs once (if any file product is dirty), then every dirty file product caches its output. Clean file products skip both phases.
This shape keeps the executor simple and reuses all existing caching, skipping, and restore logic without modification.
Config reference
[processor.mass_generator.<INSTANCE>]
# The tool's build command. Runs once per batch of dirty file products.
command = "mkdocs build --site-dir _site"
# The tool's plan command. Must print the JSON manifest to stdout.
# May be the same binary with a different flag or a separate script.
predict_command = "mkdocs-plan"
# Where the tool will produce its outputs. Every manifest entry's path
# must fall inside one of these directories. Used for verification.
output_dirs = ["_site"]
# Standard scan fields still apply — they bound which source changes
# trigger a replan.
src_dirs = ["docs", "templates"]
src_extensions = [".md", ".html", ".yaml"]
# Optional: skip strict output verification for this instance.
# Useful during development of the tool itself. Default: false.
loose_manifest = false
Interaction with the shared-output-directory design
This new processor type does not replace the Creator / shared-output-directory mechanism. Both coexist:
| User declares | Treated as | Caching | Cross-processor deps |
|---|---|---|---|
output_dirs only | Creator (opaque) | One tree per build | Only via declared files |
output_dirs + predict_command | MassGenerator | Per file | Full — all files known |
Choose Creator when the tool can’t enumerate its outputs. Choose MassGenerator when it can.
Design invariants (for tool authors)
For a tool to be consumed as a MassGenerator, predict_command must uphold:
- Pure function of config + source tree. Same inputs → same manifest, bit for bit.
- Cheap or cached. rsconstruct calls this on every graph build. Slow predict_command means slow rsconstruct invocations.
- Matches the build command’s actual outputs. Predicted paths = actual paths. Violations are hard errors in strict mode.
- Deterministic variable outputs. If the tool produces tag pages or archive pages or anything else content-derived,
predict_commandmust compute them from the same source inspection pass.
The rssite README spells out a concrete contract that meets these invariants.
Advantages
1. Shared-directory ownership becomes trivial
Every generated file has a declared owner at graph-build time. The existing output-conflict check catches overlaps instantly:
Output conflict: _site/about.html is produced by both [mass_generator.mkdocs] and [explicit.pandoc]
The complex path_owner + tree filtering + previous-tree cleanup mechanism (see Shared Output Directory) is still there as a safety net, but for MassGenerators it’s mostly unnecessary.
2. True cross-processor dependencies
Downstream processors (linters, compressors, sitemap builders) can declare the MassGenerator’s outputs as inputs. The graph connects properly. Impossible with opaque Creators.
3. Per-file caching
Change docs/tutorial.md → rebuild only _site/tutorial.html. On a large site this is the difference between “rebuild in 50ms” and “rebuild in 30s.”
Note: the per-file caching on the rsconstruct side only saves the tool invocation when ALL file products are clean. If any one is dirty, the tool runs once and produces everything — then clean files are still cached individually (useful across different invocations). True per-file build speed requires the tool itself to support partial builds. rssite will; most existing tools won’t.
4. Parallel file caching
With per-file products, different files can be cached to the object store in parallel after the build. Minor win, but free.
5. Precise clean, precise restore, real dep graphs
Every downstream feature that relies on declared outputs — clean outputs <path>, graph visualization, dry-run, watch mode — works correctly for MassGenerator outputs without special cases.
Disadvantages
1. Predictor drift
If predict_command lies (or gets out of sync with the tool), the cache can be corrupted silently: predicted paths get restored, actual build produces different paths, orphan files accumulate. Strict-mode verification after each build is the guardrail — it catches drift at build time rather than at next-restore time.
2. Predict-time cost
Every graph build runs predict_command. For large sites this may mean parsing every source file to enumerate outputs. The manifest cache (keyed on source-tree hash) mitigates but doesn’t eliminate this.
3. Partial build support
The per-product caching model wants “rebuild just this one file” but most tools rebuild everything per invocation. With mkdocs, hugo, jekyll, you pay full build cost whenever anything is dirty, regardless of how many files changed. rssite is being designed to support partial builds from day one; existing tools would need patches.
4. Engineering cost
The MassGenerator type is a new processor class with new execution semantics (“one invocation for many products”). That’s real implementation work in the executor, plus a new config schema, plus manifest parsing, plus verification logic.
5. Variable outputs may require heavy parsing
Tag pages, archive indices, RSS feeds — all content-derived. predict_command has to do enough source parsing to enumerate them. For well-designed tools this is cheap (the same parsing feeds both plan and build). For retrofitted tools it’s often duplicate work.
Open questions
These should be resolved during implementation:
- Single-pass mode: should we support a
--print-manifestflag oncommanditself, so one invocation does both plan and build? Faster for full rebuilds, slightly uglier config. Probably yes, optional. - Manifest schema evolution: how do we handle
version: 2? Support both for a transition period, or hard-require upgrade? Probably both-for-N-releases. - Incremental invalidation: when the manifest changes between builds (e.g., a new page added), how is the old cache cleaned? The existing descriptor-based cache handles this automatically (unreferenced cache entries are eventually pruned), but the behavior deserves explicit documentation.
- Interaction with
file_index: predicted outputs need to appear in the file index so downstream processors can discover them during their own scan phases. Must be registered before discover_products runs. - Watch mode: when a source file changes, do we re-run
predict_commandor reuse the last manifest? The hash-based cache mostly handles this, but edge cases around plugin-rewritten outputs need thinking.
Recommendation
Build this once rssite (or any other cooperating tool) is far enough along to drive concrete requirements. Implementing it against a hypothetical tool wastes work — we’d guess at features. Implementing against rssite (where we control both sides) grounds the design in reality.
When implemented, do it in this order:
- New processor type
mass_generatorregistered in the plugin registry. - Config schema (
predict_command,loose_manifest). - Plan phase: invoke
predict_command, parse JSON, create products. - Execution phase: batching logic — one invocation per instance, per build.
- Strict verification after build.
- Manifest caching (skip re-plan when source tree unchanged).
- Documentation in
docs/src/processors/mass_generator.mdonce it’s real.
See also
- MassGenerator processor type — processor-type documentation (forthcoming)
- Shared Output Directory — how we handle opaque Creators today
- Processor Ordering — the sibling discussion about explicit ordering
- Cross-Processor Dependencies — why per-file outputs enable proper dependency graphs
- rssite — static site generator built to implement the MassGenerator contract
Per-Processor Statistics
rsconstruct shows several “per-processor” or “per-analyzer” statistics tables
(cache stats, analyzers stats, graph stats, build summaries). These all
look similar on the surface, but the data source differs, and that changes
what we can cheaply show.
This document explains:
- The three data sources that feed per-X statistics.
- The per-processor grouping problem in
cache stats. - Options for fixing it, with tradeoffs.
- Secondary cleanup — graph-level helpers.
The three data sources
| Question | Lives where | Cost of grouping by X |
|---|---|---|
| “How many products does pylint have in this build config?” | graph (in-memory) | free |
| “How many products were built / skipped / restored this run?” | executor stats (in-memory) | free |
| “How many files did each analyzer find?” | .rsconstruct/deps.redb (on disk, keyed by analyzer) | fast — single DB scan, key is already the analyzer name |
| “How big is my on-disk cache, per processor?” | .rsconstruct/cache/descriptors/ (on disk) | see below — this is the problem |
Graph (in-memory, rebuilt each run)
Every Product carries its processor: String field. Grouping is a simple
iteration over Vec<Product>, constructing a HashMap<String, T> on the spot.
Every caller that wants per-processor stats does this inline — see
builder/graph.rs:111, builder/build.rs:323,436,467,
executor/execution.rs:180,479,524,540.
Analyzer dependency cache (deps.redb)
The redb schema stores each entry keyed by (source path → dependencies) and
tagged with the analyzer that produced it. DepsCache::stats_by_analyzer()
scans the DB once and returns HashMap<analyzer, (file_count, dep_count)>.
Grouping is effectively free because the analyzer name is a first-class field.
Object-store descriptors (.rsconstruct/cache/descriptors/)
Each descriptor file is a small JSON blob describing one cached product — its outputs, their checksums, etc. The filename is a hash of the product’s cache key; the file’s location tells us nothing about which processor created it.
Today’s code in object_store/management.rs:169:
#![allow(unused)]
fn main() {
pub fn stats_by_processor(&self) -> BTreeMap<String, ProcessorCacheStats> {
// walk every file in descriptors_dir
// read the file
// parse the JSON
// ...
// "We can't extract processor name from a hashed descriptor key.
// Use 'all' as a single bucket for now."
let processor = "all".to_string();
}
}
Two things are wrong with this:
- It’s a white lie. The function is named
stats_by_processor, but it returns a single"all"bucket. There is no per-processor grouping. - It’s slow. Even to produce that single bucket, it reads and parses every descriptor file. For 10,000 cached products that’s 10,000 syscalls and 10,000 JSON parses, just to count entries.
Why this matters: declared-but-empty processors
In analyzers stats, if a user declares [analyzer.cpp] in rsconstruct.toml
but the analyzer never matches anything, the table shows a cpp 0 0 row
(implemented 2026-04-12). This is a useful signal: “you configured it, but it
is silently doing nothing.”
We’d like the same in cache stats: show every enabled processor, including
those with zero cached entries, so that users notice mis-configurations.
We cannot implement this today. If we listed declared processors with
zeros, real entries would still be lumped into "all", so the table would
show:
all: 50 entries, 58 outputs, 3.2 MiB
ruff: 0 entries, 0 outputs, 0 bytes ← misleading
pylint: 0 entries, 0 outputs, 0 bytes ← misleading
Total: 50 entries, 58 outputs, 3.2 MiB
That’s worse than the current output — it tells the user “pylint produced nothing” when pylint may actually have plenty. Fixing the 0-rows UX requires first fixing the grouping itself.
Options to fix per-processor cache grouping
Option A — embed the processor name inside each descriptor
Add a processor: String field to CacheDescriptor. The cache-insert path
populates it (already known at that point). stats_by_processor reads the
field instead of hard-coding "all".
- ✅ Small, localized change — ~100–150 lines including a backward-compat fallback for old descriptors.
- ❌ Does not fix the slowness. We still read and parse every descriptor to learn the grouping.
- ❌ Cache format change requires either a migration step, a “legacy entries
show up as
unknown” fallback, or a cache wipe on upgrade.
Option B — encode the processor name in the descriptor’s path
Layout changes from:
.rsconstruct/cache/descriptors/
ab/
cd/
abcd1234…json
to:
.rsconstruct/cache/descriptors/
ruff/
abcd.json
ef01.json
pylint/
9876.json
stats_by_processor becomes:
#![allow(unused)]
fn main() {
for each subdir of descriptors/:
name = subdir.file_name() // free — already a String in the dir entry
count = number of files in subdir // one readdir per processor
}
- ✅ Fixes grouping and speed simultaneously. 30
readdirs instead of 10,000reads is two to three orders of magnitude faster. - ✅ Trivially answers “does this processor have any cached entries at all?”
with
exists(descriptors/NAME/). - ❌ Changes on-disk cache layout. Requires migration.
Since descriptors are a cache by definition (regenerable from a build), the simplest migration is: detect the old layout on startup and wipe it. Next build repopulates under the new layout. No data loss beyond a slower first build post-upgrade.
Option C — maintain a processor→count index in a redb sidecar
Keep a small redb database (e.g. .rsconstruct/cache/stats.redb) with a table
mapping processor_name → (entry_count, output_count, output_bytes). The
cache insert / evict paths update this index transactionally alongside the
descriptor write.
stats_by_processor becomes:
#![allow(unused)]
fn main() {
let db = redb::Database::open("cache/stats.redb")?;
let table = db.begin_read()?.open_table(STATS_TABLE)?;
// One DB read per processor — counts are pre-aggregated.
}
- ✅ Answers
cache statsin O(P) where P = number of processors, independent of cache size. Even faster than Option B at scale. - ✅ No on-disk layout change to the descriptors themselves — the sidecar sits alongside the existing directory structure.
- ✅ Bytes / output counts are maintained eagerly, so the “bytes” axis is also
free (unlike Option B, which still needs to
stateach blob for bytes). - ❌ Two sources of truth. If the sidecar and the descriptor directory
ever disagree (crash mid-write, manual
rmof a descriptor, remote-cache sync, a bug in an insert path), the UI lies. Requires either transactional atomicity across two stores (hard — redb transaction + filesystem write) or a periodic reconciliation pass. - ❌ Every cache-insert path needs to update the sidecar. Miss one, and the counts drift silently. Options B and A put the source-of-truth physically next to the cache entry, so there’s no drift to manage.
- ❌ Cache invalidation logic gets more complex: evicting a descriptor now means “delete the file AND decrement the counter AND handle the decrement failing.” More moving parts, more places for bugs.
- ❌ Doesn’t help with any future “list all entries for processor X” query — you’d still need Option B’s path layout for that, or fall back to a full walk.
Verdict: Option C is the fastest for this one specific query, but it pays for it with a consistency problem that didn’t exist before. Options A and B keep the cache self-describing — the descriptor itself (or its path) IS the fact — so they’re immune to drift.
Option comparison
| Aspect | A (field in descriptor) | B (processor in path) | C (redb sidecar) |
|---|---|---|---|
| Grouping correctness | yes | yes | yes (if kept in sync) |
| Scan cost | O(N) reads | O(P) readdirs | O(P) DB reads |
| Bytes count free | no | no (still stat blobs) | yes (pre-aggregated) |
| On-disk layout change | descriptor format | directory layout | new sidecar file |
| Source of truth | descriptor | descriptor path | two stores |
| Drift risk | none | none | real — needs reconciliation |
| Migration cost | wipe or dual-read | wipe | initial scan to populate |
| Code complexity | low | low | medium-high |
| Helps other queries | no | yes (list-by-processor) | no |
Recommendation
Option B. The extra invasiveness is one-time (migration). The speed and correctness wins are permanent; the path layout is self-describing, so no drift risk; and it also unlocks fast “list entries for processor X” queries that Options A and C don’t.
Option C is attractive if the only query we cared about was a single summary, but the sidecar’s consistency burden is real and tends to surface as bugs in edge cases (remote-cache sync, partial writes, manual cleanup).
On Option B’s “cost”
The only new artifact on disk is N extra directory entries at the top level
of descriptors/, where N is the number of distinct processors that have
ever cached anything. In practice that’s 10–30 directories. Filesystems handle
that trivially — both ext4 and btrfs are fine with thousands of top-level
entries, let alone tens.
In return we get:
stats_by_processorin O(N readdirs) instead of O(cache_size reads).- Honest “declared-but-empty” rows in
cache stats(empty dir = 0 entries, and there is no drift to reconcile). - Fast “list cache entries for processor X” — a single
readdir. - A self-describing cache:
ls .rsconstruct/cache/descriptors/tells you at a glance which processors have cached anything.
The cost is negligible; the payoff is across the board.
Implementation plan (Option B)
-
Cache insert path. Change the descriptor write to
descriptors/<processor>/<hash>.json(replacing the currentdescriptors/<hash-prefix>/<hash-suffix>/<hash>.jsonsharding). The processor name is already known at insert time — it’s on the product. -
Cache read path. Descriptor lookups happen by cache key. If the lookup caller already has the processor name, read directly. Otherwise scan the processor subdirs (rare path — most lookups come from a build graph where the processor is known).
-
stats_by_processorrewrite. Iterate subdirs ofdescriptors/; each subdir name is a processor. Count files within. For the “bytes” axis, continue to stat the corresponding blob objects. -
Migration. On startup, if old-layout descriptor files exist (files directly under sharded
ab/cd/subdirs, or anywhere that isn’t a recognized processor name), wipedescriptors/. Cache is regenerable by definition; next build repopulates under the new layout. Users pay one slower build post-upgrade, no data loss. -
cache statsUX. Once grouping is real, enumerate declared processors fromrsconstruct.tomland union them with processors present indescriptors/. Show a 0-row for anything declared-but-empty (mirrors theanalyzers statstreatment already implemented inbuilder/analyzers.rs).
Scope
Most of the work lives in src/object_store/:
management.rs—stats_by_processorrewrite.- The insert/read paths (split across
object_store.rsand neighbors) — path-construction change. - The cache-clean / trim paths — updated to walk the new layout.
Followed by a small change in src/main.rs (CacheAction::Stats) to
consume the new grouped output and render a table with the declared-union
treatment.
Estimated: a couple hundred lines, concentrated in a single module.
Secondary cleanup — graph-level helpers
Every caller that wants per-processor grouping over the current graph
currently writes the same HashMap pattern inline:
#![allow(unused)]
fn main() {
let mut per_processor: HashMap<&str, _> = HashMap::new();
for product in graph.products() {
per_processor.entry(&product.processor).or_default() += ...;
}
}
We could add BuildGraph::products_by_processor() -> &HashMap<String, Vec<ProductId>>
as a lazily-computed cached view (computed on first access, invalidated only
when the graph is mutated).
- Benefit: de-duplicates the pattern in ~5 call sites.
- Cost: caching / invalidation logic.
- Priority: low. The inline grouping is O(N) over RAM iteration and is not a performance bottleneck.
Don’t do this unless a sixth call site shows up.
Current state (2026-04-12)
analyzers stats: fixed. Shows declared-but-empty rows. Separator between data and Total.cache stats: unchanged. Still uses single-bucket"all"grouping. Documented as a known limitation here; fix is pending Option B.- Graph helpers: not added. Inline pattern remains across call sites.
See also
- Cache System — object-store layout, descriptor keys.
- Checksum Cache — mtime-based content-hash caching.
- Dependency Caching — analyzer dependency cache (which does have per-analyzer grouping built in).
Profiling
This chapter records concrete profiling runs on rsconstruct, with methodology and findings pinned to a specific version. Add new runs as new sections with date + version headers so historical data stays intact.
How to profile locally
Build a profile-friendly binary
The default release profile strips symbols, so stack traces come out as raw
addresses. Cargo.toml defines a profiling profile that inherits release
but keeps full debug info:
[profile.profiling]
inherits = "release"
strip = false
debug = true
Build with:
cargo build --profile profiling
# binary lands in target/profiling/rsconstruct
Prerequisite: relax perf_event_paranoid
Kernel sampling (perf, samply) requires kernel.perf_event_paranoid <= 1. On
a personal dev machine, persist it:
echo 'kernel.perf_event_paranoid = 1' | sudo tee /etc/sysctl.d/60-perf.conf
sudo sysctl --system
Record with perf (text-pipeline-friendly)
On CPUs without LBR (most laptops), DWARF unwinding is very slow to
post-process — don’t use --call-graph dwarf unless you’re patient. Without a
call graph you still get reliable self-time attribution:
perf record -F 999 -o /tmp/rsc.perf.data -- \
target/profiling/rsconstruct --quiet --color=never status
perf report -i /tmp/rsc.perf.data --stdio --no-children \
--sort symbol --percent-limit 0.1
Alternative: samply (Firefox-Profiler UI)
cargo install samply
samply record -r 4000 -o /tmp/rsc.json.gz -- \
target/profiling/rsconstruct --quiet --color=never status
Default behavior opens a local UI. Use --save-only to just write the file.
Hardware counters
perf stat -d -- target/profiling/rsconstruct --quiet --color=never status
Gives IPC, cache miss rates, branch miss rates — useful for “is this CPU-bound, memory-bound, or branch-mispredict-bound.”
Run: 2026-04-12 — rsconstruct 0.8.1 — status on teaching-slides
Target
- Command:
rsconstruct --quiet --color=never status - Project:
../teaching-slides(10,027 products across 10 processors). - Product breakdown: explicit (1), ipdfunite (55), markdownlint (824), marp (824), ruff (19), script.check_md (824), script.check_svg (3327), svglint (3327), tera (2), zspell (824).
Methodology
- Binary:
target/profiling/rsconstruct(release + debug info). - Sampler:
perf record -F 999(no call-graph — LBR unavailable, DWARF too slow to post-process on this host). - Counters:
perf stat -d.
Wall-clock and counters
| Metric | Value |
|---|---|
| Wall time | 1.08 s |
| User time | 0.99 s |
| System time | 0.08 s |
| CPU utilization | 98.7 % of 1 core |
| RSS peak | 28 MB |
| Instructions | 21.10 B |
| Cycles | 5.30 B |
| IPC | 3.98 (very high) |
| Frontend stall | 12.8 % |
| Branches | 5.11 B |
| Branch miss rate | 0.60 % |
| L1-dcache loads | 7.03 B |
| L1-dcache miss rate | 4.13 % |
Interpretation: high IPC, low miss rates, low branch mispredictions. The CPU pipeline is fully utilized — slowness comes from doing too many instructions, not from cache thrash or branch mispredicts.
Hot spots (self-time)
| % of CPU | Function |
|---|---|
| 48.79 % | std::path::Components::parse_next_component_back |
| 12.90 % | <std::path::Components as DoubleEndedIterator>::next_back |
| 10.84 % | rsconstruct::graph::BuildGraph::add_product_with_variant |
| 8.43 % | <std::path::Components as PartialEq>::eq |
| 1.41 % | __memcmp_evex_movbe |
| 1.04 % | core::str::converts::from_utf8 |
| 0.89 % | _int_malloc |
| 0.78 % | std::fs::DirEntry::file_type |
| 0.61 % | <std::path::Path as Hash>::hash |
| 0.60 % | <std::path::Components as Iterator>::next |
| 0.38 % | std::sys::fs::metadata |
| 0.38 % | <sip::Hasher as Hasher>::write |
| 0.37 % | sha2::sha256::x86::digest_blocks |
| 0.34 % | <core::str::lossy::Utf8Chunks as Iterator>::next |
| 0.31 % | _int_realloc |
| 0.29 % | _int_free_chunk |
| 0.19 % | rsconstruct::graph::Product::cache_key |
| 0.19 % | std::path::compare_components |
| 0.19 % | serde_json::read::SliceRead::parse_str |
| 0.19 % | statx |
| 0.19 % | malloc |
| 0.19 % | cfree |
| 0.18 % | core::hash::BuildHasher::hash_one |
| rest | scattered < 0.15 % each |
Findings
~70 % of CPU is in PathBuf iteration / comparison. Specifically
parse_next_component_back + next_back + Components::eq, all invoked
from PathBuf equality and hashing. Filesystem I/O (readdir, stat, open)
is under 2 %. Hashing (SHA-256 + SipHash) is under 1 %.
The callsite is BuildGraph::add_product_with_variant in src/graph.rs
(lines 221–307). It contains three loops whose path-equality cost
dominates the whole run:
-
Lines 232–242 — checker dedup loop. For every checker product (outputs empty), scans every existing product and compares
existing.inputs[0] == inputs[0](fullPathBufequality, which iterates components). With 7,000+ checker products in teaching-slides (script.check_md+script.check_svg+svglint+markdownlint+zspell), this is an O(P²) pass per processor over the course of discovery. -
Lines 252–253 — superset check for generator re-declarations. Includes
existing.inputs.iter().all(|i| inputs.contains(i))— an O(M²) call, again per-insertion, again comparingPathBufs component-by-component. -
Lines 246–285 — output conflict check. Fast path (HashMap lookup); not the bottleneck.
Graph mutation itself (add_product_with_variant self-time, 10.84 %) is
modest. The quadratic scans inside it are where the time goes — they just
happen to be attributed to the stdlib path-iteration functions.
Suggested fix (not yet implemented)
Index the checker-dedup and generator-superset lookups via a HashMap keyed on
(processor, primary_input, variant) so the linear scans become O(1). For
10,027 products, the expected improvement is ~3×–5× on status wall time.
Scope: additions to BuildGraph (two new HashMap index fields, kept in
sync with add_product_*), a small change to add_product_with_variant to
do HashMap lookups instead of linear scans. No cache-layout or
on-disk-format changes.
Raw data
/tmp/rsc.perf.datawas recorded and analyzed to produce the tables above. Removed afterwards — regenerate via the methodology section if needed.
Run: 2026-04-12 (later) — HEAD after HashMap dedup fix
Wall-clock and counters
| Metric | Value | vs. 0.8.1 tag |
|---|---|---|
| Wall time | 0.265 s | 4.1× faster |
| Instructions | 2.05 B | -90 % |
| Cycles | 0.88 B | -83 % |
| IPC | 2.34 | was 3.98 |
| L1-dcache miss rate | 1.34 % | was 4.13 % |
The quadratic path-equality peak is gone. What remains is the normal cost
of using PathBuf as HashMap keys.
Hot spots (self-time, user-space, 9,948 samples, 10 iterations)
| % | Function | Category |
|---|---|---|
| 5.42 | core::str::converts::from_utf8 | UTF-8 validation |
| 3.52 | sip::Hasher::write | HashMap hashing |
| 3.51 | <Path as Hash>::hash | HashMap hashing |
| 3.39 | sha2::sha256::digest_blocks | Checksumming |
| 2.09 | Components::next | Path iteration |
| 2.00 | _int_malloc | Allocator |
| 1.92 | parse_next_component_back | Path iteration |
| 1.60 | compare_components | Path comparison |
| 1.19 | combined_input_checksum | Checksumming |
| 1.12 | Product::cache_key | Cache keys |
Run: 2026-04-12 (later still) — HEAD after path interning
Context
BuildGraph’s three hot HashMaps (output_to_product, input_to_products,
checker_dedup) switched from PathBuf keys to a private PathId(u32)
backed by an in-memory PathInterner. See
Path Interning for design.
Wall-clock and counters
| Metric | Value | vs. previous |
|---|---|---|
| Wall time | 0.245 s | -8 % |
| Instructions | 2.04 B | ~flat |
| Cycles | 0.91 B | ~flat |
Hot spots (self-time, user-space, 9,925 samples, 10 iterations)
| % | Function | Notes |
|---|---|---|
| 4.34 | core::str::converts::from_utf8 | unchanged |
| 3.21 | sha2::sha256::digest_blocks | unchanged |
| 2.60 | Components::next | unchanged |
| 2.39 | sip::Hasher::write | down from 3.52 % |
| 2.25 | <Path as Hash>::hash | down from 3.51 % |
| 2.06 | _int_malloc | unchanged |
| 1.52 | resolve_dependencies | new — attribution shift |
| 1.19 | compare_components | down from 1.60 % |
| 1.09 | combined_input_checksum | unchanged |
Interning paid off exactly where predicted — the hashing/compare columns
dropped, and resolve_dependencies appears because its inner loop is now
small enough to self-attribute rather than vanish inside the stdlib path
functions. The total gain is modest (~8 %) because after the HashMap dedup
fix, HashMap key cost was only ~7 % of total, and interning cuts that in
half.
Candidate next targets (not yet implemented)
- UTF-8 validation (~6 %) — from
display().to_string()in cache-key building. Cache the string form per product or build keys from raw bytes. Product::cache_key+ hex encoding (~2 % combined) — precompute and memoize per product.- SHA-256 (~3 %) — already hardware-accelerated; the only lever is
fewer calls, via memoized
input_checksumor better batch reuse.
See also
- Path Interning — the optimization applied in the most recent run.
- Per-Processor Statistics — the previous perf
discussion; describes why
cache statsis slow (O(N descriptor reads)). That’s independent of this graph-construction finding. - Architecture — overview of the graph and how products are added.
Path Interning
Interning is a data-structure optimization that replaces PathBuf HashMap
keys with small integer IDs. It exists to cut the cost of hashing, comparing,
and cloning paths during graph construction.
Motivation
The Profiling run on teaching-slides (10,027 products)
pointed at three quadratic scans inside BuildGraph::add_product_with_variant.
Replacing those scans with HashMap<PathBuf, _> indexes took status from
1.08 s to 0.26 s.
The remaining 0.26 s is dominated, by category:
| Category | % of CPU |
|---|---|
Path iteration (Components) | ~10 % |
| HashMap hashing (SipHash + Path) | ~7 % |
| Allocator churn (malloc/free) | ~6 % |
| UTF-8 validation/decoding | ~7 % |
| Checksumming (SHA-256 + keys) | ~6 % |
A lot of that is the cost of using PathBuf as a HashMap key. Every insert
and lookup does:
- Hash the path — walks every component, hashes each byte. O(path length).
- On collision, compare paths — walks both paths component-by-component.
- Clone the path to store as key —
PathBufallocation + copy.
With ~10,000 products participating in multiple maps (output_to_product,
input_to_products, checker_dedup), this work dominates what remains.
The idea
Assign each unique path a u32 ID once, then use the ID everywhere the path
is used as a HashMap key or for comparison. Hashing a u32 is one
instruction. Comparing two u32s is one instruction. No allocation.
#![allow(unused)]
fn main() {
#[derive(Copy, Clone, Eq, PartialEq, Hash)]
pub struct PathId(u32);
pub struct PathInterner {
to_id: HashMap<PathBuf, u32>, // used during insertion
from_id: Vec<Arc<PathBuf>>, // id -> path (for display / FS ops)
}
impl PathInterner {
pub fn intern(&mut self, p: &Path) -> PathId { /* ... */ }
pub fn get(&self, id: PathId) -> &Path { /* ... */ }
}
}
Every hot HashMap that currently keys on PathBuf switches to PathId.
In-memory only
Interned IDs are per-process. They are assigned fresh at the start of
every rsconstruct invocation and dropped when the process exits. They
never touch disk.
| Data | Lives in | IDs used? |
|---|---|---|
BuildGraph HashMaps | RAM, this process | Yes |
| On-disk cache (redb descriptors, etc.) | Disk, persistent | No |
| Config files, discovered files | Disk | No |
The path foo/bar.md might be PathId(42) today and PathId(17) tomorrow.
That is fine because nothing persistent ever referred to 42.
The boundary rule: PathId must not leak into anything persistent.
Specifically:
- Cache keys on disk (
Product::cache_key,descriptor_key) must keep using real paths or content checksums. - Logs and error messages must print real paths, not IDs.
- Nothing serializes the interner state.
Why it helps here
- Paths are reused heavily. One
.mdfile feedsmarkdownlint,zspell,script.check_md,marp. Interning collapses four HashMap key clones into one. - The same path appears as a lookup key in every dedup map during graph
construction. Each lookup becomes
hash(u32) + compare(u32)instead of walking a path’s components. - Product inputs/outputs can still be stored as
PathBufpublicly — the optimization targets the HashMap keys, not the product data itself. This keeps the refactor’s blast radius small.
Scope of the change
Narrow scope — only the three hot HashMaps in BuildGraph:
output_to_product: HashMap<PathBuf, usize>→HashMap<PathId, usize>input_to_products: HashMap<PathBuf, Vec<usize>>→HashMap<PathId, Vec<usize>>checker_dedup: HashMap<(String, PathBuf, Option<String>), usize>→HashMap<(String, PathId, Option<String>), usize>
The interner lives on BuildGraph. Callers still pass PathBuf/&Path to
add_product* — the interner is a private implementation detail. Public
access to Product.inputs/outputs/output_dirs remains unchanged.
Non-goals
- No on-disk format change. Cache entries keep using real paths.
- No API change to
Product. Inputs and outputs stay asVec<PathBuf>. - No plugin-facing change. Lua processors keep seeing paths.
Risks
- The interner’s own
to_idmap still hashes aPathBufonce per unique path. Unavoidable — this is the cost of asking “have I seen this path before?” - Every call site that hashes a
&Pathinto aBuildGraphmap now callsinterner.intern()orinterner.get_id(). Must be careful not to callintern()(mutating) on read-only paths, or lookups may create spurious entries.
See also
- Profiling — the measurement that motivated this.
- Architecture — how
BuildGraphfits into the overall design.
Unreferenced Files
Purpose
Find files on disk that are not referenced by any product in the build graph. This helps identify forgotten assets, stale files, or files accidentally excluded from the build configuration.
How It Works
When rsconstruct builds its graph, every product has an inputs list. This list
contains all files the product depends on:
- Primary inputs — the source files being processed (e.g.
foo.svgthat mermaid converts to a PNG) - Dependency inputs — files that affect the output but are not the primary
source (e.g. a C header file
utils.hthatmain.cincludes, a config file like.ruff.toml, or a script passed viadep_inputs)
A file is unreferenced if it does not appear in the inputs list of any
product in the graph — neither as a primary input nor as a dependency input.
Why both primary and dependency inputs?
Consider a C header file utils.h. It is not a primary input (the compiler does
not produce output directly from it), but it appears in dep_inputs because
changes to it must trigger a rebuild of any .c file that includes it. Such a
file is clearly referenced and should not be reported as unreferenced.
Only files that appear in no product’s inputs list — not primary, not dependency — are reported.
Usage
rsconstruct graph unreferenced --extensions .svg[,.png,...] [--rm]
Options
| Option | Description |
|---|---|
--extensions | Comma-separated list of file extensions to check (required) |
--rm | Delete the unreferenced files immediately (no confirmation) |
Examples
Find unreferenced SVG files:
rsconstruct graph unreferenced --extensions .svg
Find unreferenced images of any type:
rsconstruct graph unreferenced --extensions .svg,.png,.jpg
Delete unreferenced SVG files:
rsconstruct graph unreferenced --extensions .svg --rm
Output
Plain list of file paths, one per line, relative to the project root:
assets/old_diagram.svg
docs/unused_figure.svg
scratch/test.svg
Design Notes
- Extensions are required — defaulting to all files would produce excessive noise (READMEs, Makefiles, config files, etc. are intentionally not in the graph).
- Finding unreferenced files does not mean they are useless. The user decides
what to do. Common reasons a file might be unreferenced:
- It was part of a processor whose
src_dirsorsrc_extensionsexcludes it - It was intentionally left out of the build
- It is a leftover from a renamed or deleted processor instance
- It is a scratch/draft file
- It was part of a processor whose
--rmdeletes without confirmation. Use with care.- The command requires a
rsconstruct.toml(the graph must be buildable).
Distributed Execution
This document explores what distributed execution would mean for RSConstruct — the problems it solves, the problems it creates, how other build tools approach it, and what a design might look like.
What distributed execution means
Today RSConstruct runs all products on the local machine, optionally in parallel
across multiple cores (-j). Distributed execution means offloading individual
products to remote workers — other machines on a network — so that the build
exploits more CPU than any single machine has.
This is distinct from remote caching (which RSConstruct already has). Remote caching avoids re-running a product whose result was already computed by someone else. Distributed execution runs products remotely even when no cached result exists. The two features compose: a distributed build that also has remote caching can share results across runs and across users.
The problems it solves
- Slow builds on large codebases. When thousands of C files need checking
or hundreds of PDFs need rendering, a single machine is the bottleneck even
with
-j. A cluster of workers can run all of them truly in parallel. - CI latency. CI machines are often single-core or have limited parallelism. Distributing work across a pool of CI agents cuts wall-clock time.
- Memory pressure. Some tools (Chromium, LibreOffice, heavy linters) are memory-hungry. Spreading them across machines avoids OOM conditions.
The problems it creates
Input availability
Every product needs its inputs on the worker. For a checker that reads a single
source file, this means uploading that file to the worker (or having it available
via a shared filesystem). For a generator with many dep_inputs, it may mean
uploading dozens of files. This is a non-trivial data transfer problem.
The content-addressed object store already solves this at the output side — outputs are stored by SHA-256. The same mechanism can serve inputs: if the worker has a local object store, the coordinator only needs to send checksums, and the worker fetches missing objects from the remote cache. Products whose inputs are already cached require zero transfer.
Output collection
After execution, the worker’s outputs must be pushed back to the coordinator (or directly to the remote cache) so local build phases and downstream products can use them. This is essentially the existing remote cache push path.
Hermeticity
Distributed workers only produce correct results if builds are hermetic — the
product’s output depends only on its declared inputs, not on ambient machine
state (installed tools, environment variables, filesystem layout). RSConstruct
does not enforce hermeticity today. A worker with a different version of ruff
or cppcheck than the local machine will produce different results.
This is the hardest problem. Options:
- Ignore it — document that workers must have identical tool versions;
use tool locking (
rsconstruct tools lock) to detect divergence. - Containers — run each product in a container image that includes all required tools. Bazel and BuildBuddy do this. Heavy but correct.
- Nix/flakes — pin tools via Nix derivations on all workers. Correct but requires Nix infrastructure.
Scheduling and load balancing
Which products go to which worker? A central coordinator must:
- Know the graph (dependency order).
- Dispatch products whose dependencies are already satisfied.
- Avoid overloading any single worker.
- Handle worker failure (retry on another worker).
This is a distributed systems problem. Even a simple greedy scheduler requires a reliable heartbeat, a work queue, and failure detection.
Latency overhead
For fast products (a Python lint check on a 50-line file takes ~50ms), the overhead of serializing inputs, sending them over the network, waiting for the worker, and receiving results can exceed the actual execution time. Distributed execution only pays off for products that take seconds or more, or when there are so many products that local parallelism is saturated.
How other tools do it
Bazel (Remote Execution API)
Bazel defines the Remote Execution API
(REAPI), a gRPC protocol for distributed execution. Workers implement the
Execution service; the coordinator submits Action objects (a command +
input digest tree). Workers fetch inputs from a Content Addressable Storage
(CAS) service, execute the action, and push outputs back to CAS.
Strengths: hermetic by design (actions are pure functions of their inputs),
well-specified protocol, many implementations (BuildBuddy, EngFlow, NativeLink,
self-hosted buildfarm).
Weaknesses: requires all actions to be declared with precise input sets; dynamic dependencies (header includes discovered at compile time) need special handling; heavy infrastructure to stand up.
RSConstruct’s object store is conceptually similar to CAS. The Product struct
already declares all inputs explicitly. Implementing REAPI would make
RSConstruct compatible with the existing Bazel remote execution ecosystem
without building a proprietary scheduler.
distcc
distcc distributes C/C++ compilation by intercepting gcc/clang invocations
and forwarding the preprocessed source to a pool of workers. It works at the
invocation level, not the build graph level — the local machine still runs the
build tool (make/ninja) and distcc is transparent to it.
Strengths: simple, no build tool integration required, widely deployed.
Weaknesses: only works for compilation (not linters, generators, etc.); requires preprocessing locally (partial hermeticity); no caching.
Incredibuild / Xtensa
Commercial tools that intercept process spawning at the OS level (Windows job
objects, Linux LD_PRELOAD) to virtualize and distribute arbitrary commands.
No build tool integration required; any tool that runs a subprocess can be
distributed.
Strengths: transparent to the build tool; works with any compiler or tool.
Weaknesses: proprietary; expensive; the OS-level interception is fragile.
Pants / Buck2
Both use a daemon-based architecture with a local scheduler that knows the full build graph. Distributed execution is an extension of local execution — the scheduler dispatches actions to remote workers using REAPI or a proprietary protocol. Input digests and output digests flow through a central CAS.
Pants calls this “remote execution”; Buck2 calls it “remote actions”. Both require the build rules to declare all inputs precisely (no dynamic deps).
Ninja + a distributed wrapper
Some teams wrap Ninja with distributed backends (ninja-build + icecc,
ninja + sccache, or ninja + a custom scheduler). The wrapper intercepts
compiler invocations from the Ninja process. This is similar to the distcc
approach but can handle caching (sccache) alongside distribution.
A possible design for RSConstruct
A minimal distributed execution design that fits RSConstruct’s architecture:
1. Worker protocol
Workers expose a simple HTTP API:
POST /execute
body: { product_id, command, args, input_checksums: {path: sha256, ...} }
response: { exit_code, stdout, stderr, output_checksums: {path: sha256, ...} }
Before executing, the worker fetches any inputs it doesn’t already have from the shared remote cache. After executing, it pushes outputs to the remote cache and returns their checksums.
2. Input availability via shared cache
The coordinator (local RSConstruct) ensures all inputs are in the remote cache before dispatching a product to a worker. For source files, this means uploading them once at build start. For intermediate outputs (products that are inputs to other products), they flow through the cache automatically — the producer pushes to remote, the consumer fetches from remote.
This avoids a separate “input upload” step for most products: source files are small and stable; once uploaded they stay cached across builds.
3. Coordinator changes
The executor’s product dispatch loop currently runs products locally. With distributed execution:
- Each dispatchable product is classified as local or remote based on a configurable predicate (e.g., processor type, estimated duration, worker availability).
- Remote products are submitted to a work queue.
- A pool of worker connections consumes the queue, tracking in-flight products.
- When a remote product completes, its outputs are pulled from cache and the downstream products are unblocked.
The dependency graph and topological sort are unchanged — distribution is purely an execution-layer concern.
4. Hermeticity via tool locking
Without containers, workers must have the same tool versions as the local
machine. rsconstruct tools lock already records tool version hashes.
Distributed execution should verify that each worker’s tool hashes match the
lock file before accepting products of that type. A worker with a mismatched
ruff version refuses ruff products and logs a warning.
5. What stays local
Some products cannot or should not be distributed:
- Products with
cache = false(always-rebuild, e.g., timestamp generators). - Products that depend on the local filesystem state beyond declared inputs
(e.g.,
git logstyle operations). - Creators that manage local directories (
npm install,cargo build) — their outputs are directory trees, not files, and their side effects are local. - Products faster than the round-trip overhead (most lint checks on small files).
A distributed = false config field (analogous to enabled) would let users
pin specific processors to local execution.
Current status
Not implemented. RSConstruct runs all products locally. Remote caching (push/pull of outputs) is the only cross-machine feature today.
The design above is a sketch for future consideration. The most natural first step would be implementing a minimal REAPI-compatible worker, since that would make RSConstruct interoperable with existing distributed build infrastructure (BuildBuddy, EngFlow, self-hosted buildfarm) without requiring RSConstruct- specific worker deployments.
Internal Processors
Processors that can be reimplemented in pure Rust, eliminating external tool dependencies. Internal processors are faster (no subprocess overhead), require no installation, and work on any platform with rsconstruct.
The naming convention is to prefix with i (for internal), e.g., ipdfunite replaces pdfunite.
Both the original and internal variants coexist — users choose which to use.
Implemented
ipdfunite
Replaces: pdfunite (external pdfunite binary from poppler-utils)
Merges PDFs from subdirectories into course bundles using lopdf in-process.
Same config as pdfunite minus the pdfunite_bin field. Batch-capable.
Crate: lopdf
Candidates
ijq / ijsonlint — JSON validation
Replaces: jq (checks JSON parses) and jsonlint (Python JSON linter)
Both tools ultimately just validate that files are well-formed JSON.
serde_json is already a dependency — parse each file and report errors.
Crate: serde_json (already in deps)
Complexity: Low — parse file, report error with line/column
iyamllint — YAML validation
Replaces: yamllint (Python YAML linter)
Validate that YAML files parse correctly. yamllint also checks style rules
(line length, indentation, etc.) which would need to be reimplemented if desired,
but basic validity checking is trivial.
Crate: serde_yaml
Complexity: Low for validation only, medium if style rules are needed
itaplo — TOML validation
Replaces: taplo (TOML formatter/linter)
Validate that TOML files parse correctly. The toml crate is already a dependency.
taplo also reformats — a pure validation-only internal processor covers the common case.
Crate: toml (already in deps)
Complexity: Low
ijson_schema — JSON Schema validation
Replaces: json_schema (Python jsonschema)
Validate JSON files against JSON Schema definitions. The jsonschema Rust crate
supports JSON Schema draft 2020-12, draft 7, and draft 4.
Crate: jsonschema
Complexity: Medium — need to load schema files and validate against them
imarkdown2html — Markdown to HTML
Replaces: markdown2html (external markdown CLI)
Convert Markdown files to HTML. pulldown-cmark is a fast, CommonMark-compliant
Markdown parser written in Rust.
Crate: pulldown-cmark
Complexity: Low — parse and render to HTML string, write to output file
iyamlschema — YAML Schema Validation
Validates YAML files against JSON schemas referenced by $schema URLs.
Fetches and caches schemas via the webcache, validates data against the schema
(including remote $ref resolution), and checks property ordering.
Crate: jsonschema, ureq, serde_yml
Complexity: Medium — HTTP fetching, schema compilation, recursive ordering checks
yaml2json — YAML to JSON Conversion
Convert YAML files to pretty-printed JSON.
Crate: serde_yml, serde_json
Complexity: Low — parse YAML, serialize as JSON
isass — Sass/SCSS to CSS
Replaces: sass (Dart Sass CLI)
Compile Sass/SCSS files to CSS. The grass crate is a pure-Rust Sass compiler
with good compatibility.
Crate: grass
Complexity: Low — compile input file, write CSS output
Not Suitable for Internal Implementation
These processors wrap tools with complex, evolving behavior that would be impractical to reimplement:
- ruff, pylint, mypy, pyrefly — Python linters/type checkers with deep language understanding
- eslint, jshint, stylelint — JavaScript/CSS linters with plugin ecosystems
- clippy, cargo — Rust toolchain components
- marp — Presentation framework (spawns Chromium)
- sphinx, mdbook, jekyll — Full documentation/site generators
- shellcheck — Shell script analyzer with extensive rule set
- aspell — Spell checker with language dictionaries
- chromium, libreoffice, drawio — GUI applications used for rendering
- protobuf — Protocol buffer compiler
- pdflatex — LaTeX to PDF (entire TeX distribution)
Binary Plugin System
As of now, rsconstruct does not have a binary plugin system. This section documents the approach for future consideration.
Rust applications can dynamically load plugins written in Rust via dlopen/dlsym on shared libraries (.so on Linux, .dylib on macOS, .dll on Windows). The plugin compiles as a cdylib crate, exports extern "C" functions, and the host loads them at runtime using a crate like libloading.
The main constraint is that Rust has no stable ABI. You cannot use Rust traits, generics, or standard library types across the dynamic library boundary. The plugin interface must be C-compatible: extern "C" functions returning opaque pointers, with a vtable or function-pointer struct defining the plugin API.
Crates like abi_stable attempt to provide a stable ABI layer for Rust-to-Rust dynamic loading, but they add significant complexity.
The current Lua plugin system avoids this problem entirely — Lua has a stable, simple FFI. A binary plugin system would offer better performance but at the cost of a much more complex plugin interface and build process (plugins would need to be compiled separately and matched to the host’s ABI).
Missing Processors
Tools found in Makefiles across ../*/ sibling projects that rsconstruct does not yet have processors for.
Organized by category, with priority based on breadth of usage.
High Priority — Linters and Validators
eslint
- What it does: JavaScript/TypeScript linter (industry standard).
- Projects: demos-lang-js
- Invocation:
eslint $(ALL_JS)ornode_modules/.bin/eslint $< - Processor type: Checker
jshint
- What it does: JavaScript linter — detects errors and potential problems.
- Projects: demos-lang-js, gcp-gemini-cli, gcp-machines, gcp-miflaga, gcp-nikuda, gcp-randomizer, schemas, veltzer.github.io
- Invocation:
node_modules/.bin/jshint $< - Processor type: Checker
tidy (HTML Tidy)
- What it does: HTML/XHTML validator and formatter.
- Projects: demos-lang-js, gcp-gemini-cli, gcp-machines, gcp-miflaga, gcp-nikuda, gcp-randomizer, openbook, riddles-book
- Invocation:
tidy -errors -quiet -config .tidy.config $< - Processor type: Checker
check-jsonschema
- What it does: Validates YAML/JSON files against JSON Schema (distinct from rsconstruct’s json_schema which validates JSON against schemas found via
$schemakey). - Projects: data, schemas, veltzer.github.io
- Invocation:
check-jsonschema --schemafile $(yq -r '.["$schema"]' $<) $< - Processor type: Checker
cpplint
- What it does: C++ linter enforcing Google C++ style guide.
- Projects: demos-os-linux
- Invocation:
cpplint $< - Processor type: Checker
checkpatch.pl
- What it does: Linux kernel coding style checker.
- Projects: kcpp
- Invocation:
$(KDIR)/scripts/checkpatch.pl --file $(C_SOURCES) --no-tree - Processor type: Checker
standard (StandardJS)
- What it does: JavaScript style guide, linter, and formatter — zero config.
- Projects: demos-lang-js
- Invocation:
node_modules/.bin/standard $< - Processor type: Checker
jslint
- What it does: JavaScript code quality linter (Douglas Crockford).
- Projects: demos-lang-js
- Invocation:
node_modules/.bin/jslint $< - Processor type: Checker
jsl (JavaScript Lint)
- What it does: JavaScript lint tool.
- Projects: keynote, myworld-php
- Invocation:
jsl --conf=support/jsl.conf --quiet --nologo --nosummary --nofilelisting $(SOURCES_JS) - Processor type: Checker
gjslint (Google Closure Linter)
- What it does: JavaScript style checker following Google JS style guide.
- Projects: keynote, myworld-php
- Invocation:
$(TOOL_GJSLINT) --flagfile support/gjslint.cfg $(JS_SRC) - Processor type: Checker
checkstyle
- What it does: Java source code style checker.
- Projects: demos-lang-java, keynote
- Invocation:
java -cp $(scripts/cp.py) $(MAINCLASS_CHECKSTYLE) -c support/checkstyle_config.xml $(find . -name "*.java") - Processor type: Checker
pyre
- What it does: Python type checker from Facebook/Meta.
- Projects: archive.apiiro.TrainingDataLaboratory, archive.work-amdocs-py
- Invocation:
pyre check - Processor type: Checker
High Priority — Formatters
black
- What it does: Opinionated Python code formatter.
- Projects: archive.apiiro.TrainingDataLaboratory, archive.work-amdocs-py
- Invocation:
black --target-version py36 $(ALL_PACKAGES) - Processor type: Checker (using
--checkmode) or Formatter
uncrustify
- What it does: C/C++/Java source code formatter.
- Projects: demos-os-linux, xmeltdown
- Invocation:
uncrustify -c support/uncrustify.cfg --no-backup -l C $(ALL_US_C) - Processor type: Formatter
astyle (Artistic Style)
- What it does: C/C++/Java source code indenter and formatter.
- Projects: demos-os-linux
- Invocation:
astyle --verbose --suffix=none --formatted --preserve-date --options=support/astyle.cfg $(ALL_US) - Processor type: Formatter
indent (GNU Indent)
- What it does: C source code formatter (GNU style).
- Projects: demos-os-linux
- Invocation:
indent $(ALL_US) - Processor type: Formatter
High Priority — Testing
pytest
- What it does: Python test framework.
- Projects: 50+ py* projects (pyanyzip, pyapikey, pyapt, pyawskit, pyblueprint, pybookmarks, pyclassifiers, pycmdtools, pyconch, pycontacts, pycookie, pydatacheck, pydbmtools, pydmt, pydockerutils, pyeventroute, pyeventsummary, pyfakeuse, pyflexebs, pyfoldercheck, pygcal, pygitpub, pygooglecloud, pygooglehelper, pygpeople, pylogconf, pymakehelper, pymount, pymultienv, pymultigit, pymyenv, pynetflix, pyocutil, pypathutil, pypipegzip, pypitools, pypluggy, pypowerline, pypptkit, pyrelist, pyscrapers, pysigfd, pyslider, pysvgview, pytagimg, pytags, pytconf, pytimer, pytsv, pytubekit, pyunique, pyvardump, pyweblight, and archive.*)
- Invocation:
pytest testsorpython -m pytest tests - Processor type: Checker (mass, per-directory)
High Priority — YAML/JSON Processing
yq
- What it does: YAML/JSON processor (like jq but for YAML).
- Projects: data, demos-lang-yaml, schemas, veltzer.github.io
- Invocation:
yq < $< > $@(format/validate) oryq -r '.key' $<(extract) - Processor type: Checker or Generator
Medium Priority — Compilers
javac
- What it does: Java compiler.
- Projects: demos-lang-java, jenable, keynote
- Invocation:
javac -Werror -Xlint:all $(JAVA_SOURCES) -d out/classes - Processor type: Generator
go build
- What it does: Go language compiler.
- Projects: demos-lang-go
- Invocation:
go build -o $@ $< - Processor type: Generator (single-file, like cc_single_file)
kotlinc
- What it does: Kotlin compiler.
- Projects: demos-lang-kotlin
- Invocation:
kotlinc $< -include-runtime -d $@ - Processor type: Generator (single-file)
ghc
- What it does: Glasgow Haskell Compiler.
- Projects: demos-lang-haskell
- Invocation:
ghc -v0 -o $@ $< - Processor type: Generator (single-file)
ldc2
- What it does: D language compiler (LLVM-based).
- Projects: demos-lang-d
- Invocation:
ldc2 $(FLAGS) $< -of=$@ - Processor type: Generator (single-file)
nasm
- What it does: Netwide Assembler (x86/x64).
- Projects: demos-lang-nasm
- Invocation:
nasm -f $(ARCH) -o $@ $< - Processor type: Generator (single-file)
rustc
- What it does: Rust compiler for single-file programs (as opposed to cargo for projects).
- Projects: demos-lang-rust
- Invocation:
rustc $(FLAGS_DBG) $< -o $@ - Processor type: Generator (single-file)
dotnet
- What it does: .NET SDK CLI — builds C#/F# projects.
- Projects: demos-lang-cs
- Invocation:
dotnet build --nologo --verbosity quiet - Processor type: MassGenerator
dtc (Device Tree Compiler)
- What it does: Compiles device tree source (.dts) to device tree blob (.dtb) for embedded Linux.
- Projects: clients-heqa (8 subdirectories)
- Invocation:
dtc -I dts -O dtb -o $@ $< - Processor type: Generator (single-file)
Medium Priority — Build Systems
cmake
- What it does: Cross-platform build system generator.
- Projects: demos-build-cmake
- Invocation:
cmake -B $@ && cmake --build $@ - Processor type: MassGenerator
mvn (Apache Maven)
- What it does: Java project build and dependency management.
- Projects: demos-lang-java/maven
- Invocation:
mvn compile - Processor type: MassGenerator
ant (Apache Ant)
- What it does: Java build tool (XML-based).
- Projects: demos-lang-java, keynote
- Invocation:
ant checkstyle - Processor type: MassGenerator
Medium Priority — Converters and Generators
pygmentize
- What it does: Syntax highlighter — converts source code to HTML, SVG, PNG.
- Projects: demos-misc-highlight
- Invocation:
pygmentize -f html -O full -o $@ $< - Processor type: Generator (single-file)
slidev
- What it does: Markdown-based presentation tool — exports to PDF.
- Projects: demos-lang-slidev
- Invocation:
node_modules/.bin/slidev export $< --with-clicks --output $@ - Processor type: Generator (single-file)
jekyll
- What it does: Static site generator (Ruby-based, used by GitHub Pages).
- Projects: site-personal-jekyll
- Invocation:
jekyll build --source $(SOURCE_FOLDER) --destination $(DESTINATION_FOLDER) - Processor type: MassGenerator
lilypond
- What it does: Music engraving program — compiles .ly files to PDF sheet music.
- Projects: demos-lang-lilypond, openbook
- Invocation:
scripts/wrapper_lilypond.py ... $< - Processor type: Generator (single-file)
wkhtmltoimage
- What it does: Renders HTML to image using WebKit engine.
- Projects: demos-misc-highlight
- Invocation:
wkhtmltoimage $(WK_OPTIONS) $< $@ - Processor type: Generator (single-file)
Medium Priority — Documentation
jsdoc
- What it does: API documentation generator for JavaScript.
- Projects: jschess, keynote
- Invocation:
node_modules/.bin/jsdoc -d $(JSDOC_FOLDER) -c support/jsdoc.json out/src - Processor type: MassGenerator
Low Priority — Minifiers
jsmin
- What it does: JavaScript minifier (removes whitespace and comments).
- Projects: jschess
- Invocation:
node_modules/.bin/jsmin < $< > $(JSMIN_JSMIN) - Processor type: Generator (single-file)
yuicompressor
- What it does: JavaScript/CSS minifier and compressor (Yahoo).
- Projects: jschess
- Invocation:
node_modules/.bin/yuicompressor $< -o $(JSMIN_YUI) - Processor type: Generator (single-file)
closure compiler
- What it does: JavaScript optimizer and minifier (Google Closure).
- Projects: keynote
- Invocation:
tools/closure.jar $< --js_output_file $@ - Processor type: Generator (single-file)
Low Priority — Preprocessors
gpp (Generic Preprocessor)
- What it does: General-purpose text preprocessor with macro expansion.
- Projects: demos/gpp
- Invocation:
gpp -o $@ $< - Processor type: Generator (single-file)
m4
- What it does: Traditional Unix macro processor.
- Projects: demos/m4
- Invocation:
m4 $< > $@ - Processor type: Generator (single-file)
Low Priority — Binary Analysis
objdump
- What it does: Disassembles object files (displays assembly code).
- Projects: demos-os-linux
- Invocation:
objdump --disassemble --source $< > $@ - Processor type: Generator (single-file, post-compile)
Low Priority — Packaging
dpkg-deb
- What it does: Builds Debian .deb packages.
- Projects: archive.myrepo
- Invocation:
dpkg-deb --build deb/mypackage ~/packages - Processor type: Generator
reprepro
- What it does: Manages Debian APT package repositories.
- Projects: archive.myrepo
- Invocation:
reprepro --basedir $(config.apt.service_dir) export $(config.apt.codename) - Processor type: Generator
Low Priority — Profiling
pyinstrument
- What it does: Python profiler with HTML output.
- Projects: archive.apiiro.TrainingDataLaboratory, archive.work-amdocs-py
- Invocation:
pyinstrument --renderer=html -m $(MAIN_MODULE) - Processor type: Generator
Low Priority — Code Metrics
sloccount
- What it does: Counts source lines of code and estimates development cost.
- Projects: demos-lang-java, demos-lang-r, demos-os-linux, jschess
- Invocation:
sloccount . - Processor type: Checker (whole-project)
Low Priority — Dependency Generation
makedepend
- What it does: Generates C/C++ header dependency rules for Makefiles.
- Projects: xmeltdown
- Invocation:
makedepend -I... -- $(CFLAGS) -- $(SRC) - Notes: rsconstruct’s built-in C/C++ dependency analyzer already handles this.
Low Priority — Embedded
fdtoverlay
- What it does: Applies device tree overlays to a base device tree blob.
- Projects: clients-heqa/come_overlay
- Invocation:
fdtoverlay -i $@ -o $@.tmp $$overlay && mv $@.tmp $@ - Processor type: Generator
fdtput
- What it does: Modifies properties in a device tree blob.
- Projects: clients-heqa/come_overlay
- Invocation:
fdtput -r $@ $$node - Processor type: Generator
Requirements Generator — Design
A processor that scans Python source files and produces a requirements.txt
listing the third-party distributions the project imports. Fills the gap
between the Python analyzer (which discovers local dep edges) and the pip
processor (which consumes requirements.txt).
Problem
Users have Python projects with import statements. They want the set of
PyPI distributions their code needs, written out to requirements.txt.
Today they maintain this file by hand, which drifts from the actual imports.
Shape
A whole-project Generator processor named requirements:
- Inputs: every
.pyfile in the project (same scan as the Python analyzer —file_index.scan(&self.config.standard, true)). - Output: a single
requirements.txt(path configurable). - Discovery: one
Productwith all.pyfiles as inputs, one output path. Structurally identical to thetagsprocessor.
The classification problem
Every import X lands in one of three buckets:
- Local — a module that resolves to a file in the project. Skip.
- Stdlib — a module shipped with Python (
os,sys,json, …). Skip. - Third-party — a PyPI distribution. Emit to
requirements.txt.
The Python analyzer already resolves bucket 1 via
PythonDepAnalyzer::resolve_module. The new processor needs buckets 2 and 3.
Stdlib detection
Python 3.10+ ships sys.stdlib_module_names — a frozenset of every stdlib
top-level module name. We bake this list into a static table
(src/processors/generators/python_stdlib.rs) rather than probing python3
at build time. Reasons:
- The list is stable across 3.10+ with a handful of additions per minor release.
- No tool dependency at build time — keeps the processor offline and hermetic.
- The list is ~300 names, a few KB of source.
A refresh script regenerates the table from python3 -c 'import sys; print(sorted(sys.stdlib_module_names))' when we bump Python support. The
list lives alongside the processor, not in a user-facing config.
Import → distribution mapping
The import name is not always the PyPI distribution name:
| Import | Distribution |
|---|---|
cv2 | opencv-python |
yaml | PyYAML |
PIL | Pillow |
sklearn | scikit-learn |
bs4 | beautifulsoup4 |
We bake a curated table of the common ~40 mismatches into the processor and
default everything else to identity (import X → distribution X). Users
override via config:
[processor.requirements.mapping]
cv2 = "opencv-python"
custom_internal = "our-private-dist"
User entries win over the built-in table. This is lossy by design — we accept that unusual packages need a config entry — in exchange for:
- No dependency on an installed Python environment.
requirements.txtgeneration works on a clean checkout (no chicken-and-egg withpip install).- Deterministic output regardless of the caller’s environment.
The alternative — probing importlib.metadata.packages_distributions() —
is more accurate but requires packages to already be installed. Rejected
for now; can be added later as an opt-in resolve = "probe" mode if users
hit the mapping ceiling.
Configuration
[processor.requirements]
output = "requirements.txt" # Output file path
exclude = [] # Import names to never emit (e.g. internal vendored modules)
sorted = true # Sort output alphabetically (vs. discovery order)
header = true # Emit a "# Generated by rsconstruct" header line
[processor.requirements.mapping]
cv2 = "opencv-python" # User-provided import → distribution overrides
| Key | Type | Default | Description |
|---|---|---|---|
output | string | "requirements.txt" | Output file path |
exclude | string[] | [] | Import names to never emit |
sorted | bool | true | Sort entries alphabetically |
header | bool | true | Include a comment header line |
mapping | map | {} | Per-project import→distribution overrides |
Pinning (pkg==1.2.3) is deferred. The first iteration emits bare names.
Adding pinning later means probing pip show or parsing a lockfile —
separate concern.
Code organization
Shared import scanner
Factor the regex scanning out of src/analyzers/python.rs into a module
function shared by the analyzer and the generator:
#![allow(unused)]
fn main() {
// src/analyzers/python.rs
pub(crate) fn scan_python_imports(path: &Path) -> Result<Vec<String>> { ... }
}
Returns the raw top-level module names found in import X and from X import ... lines. The analyzer then runs this through resolve_module to
keep local ones; the generator runs it through the stdlib table and
mapping to produce the final list.
This fixes architecture-observations #6 (analyzers can’t hand data to processors) at the scope of this one feature: instead of building a cross-processor channel, we share a pure function.
Files
src/processors/generators/requirements.rs— the processor, ~150 lines.src/processors/generators/python_stdlib.rs— the stdlib names table (static&[&str]) and ais_stdlib(module: &str) -> boolhelper.src/processors/generators/distribution_map.rs— the curated import→distribution mapping, aresolve_distribution(import: &str) -> &strhelper that falls through to identity.src/config/processor_configs.rs— addRequirementsConfig.src/processors/mod.rs— addpub const REQUIREMENTS = "requirements"tonamesmodule.docs/src/processors/requirements.md— user-facing processor doc.
Processor structure
Mirrors tags (whole-project generator with one output):
#![allow(unused)]
fn main() {
pub struct RequirementsProcessor {
base: ProcessorBase,
config: RequirementsConfig,
}
impl Processor for RequirementsProcessor {
fn discover(&self, graph, file_index, instance_name) -> Result<()> {
// Scan for .py files; if none, no product.
// Add one product: inputs=all .py files, outputs=[output_path].
}
fn supports_batch(&self) -> bool { false }
fn execute(&self, _ctx, product) -> Result<()> {
// 1. Scan each input .py for imports.
// 2. For each top-level module name:
// - Skip if local (resolves to a project file).
// - Skip if stdlib.
// - Skip if in user's `exclude`.
// - Map import → distribution name.
// 3. Dedupe, sort if configured, write to output.
}
}
}
Cache behavior
Falls naturally out of the descriptor-based cache:
- Inputs: every
.pyfile + config hash. - Output:
requirements.txt. - Adding/removing an import changes file contents, triggers rebuild.
- Changing config (new mapping entry, new exclude) changes config hash, triggers rebuild.
- Code changes inside a function that don’t affect imports still trigger a rebuild, since we can’t cheaply know which lines matter. Acceptable — the regeneration is fast.
Auto-detection
auto_detect returns true when the file index contains any .py files.
Same criterion as the Python analyzer.
Out of scope (first cut)
- Version pinning.
- Multiple output files (
requirements-dev.txt,requirements-test.txt). - Optional dependencies / extras (
pkg[extra]). - Reading existing
requirements.txtto preserve comments or pins. pyproject.tomlorsetup.pyoutput —requirements.txtonly.
Each is a clean follow-up if users ask.
Crates.io Publishing
Notes on publishing rsconstruct to crates.io.
Version Limits
There is no limit on how many versions can be published to crates.io. You can publish as many releases as needed without worrying about quota or cleanup.
Pruning Old Releases
Crates.io does not support deleting published versions. Once a version is uploaded, it exists permanently.
The only removal mechanism is yanking (cargo yank --version 0.1.0), which:
- Prevents new projects from adding a dependency on the yanked version
- Does not break existing projects that already depend on it (they continue to download it via their lockfile)
- Does not delete the crate data from the registry
Yanking should only be used for versions with security vulnerabilities or serious bugs, not for general housekeeping.
Publishing a New Version
- Update the version in
Cargo.toml - Run
cargo publish --dry-runto verify - Run
cargo publishto upload
Feature: Per-Processor max_jobs
Problem
When running rsconstruct build -j 20, all processors run with the same parallelism.
Processors like marp spawn heavyweight subprocesses (headless Chromium via Puppeteer),
and 20 concurrent Chromium instances cause non-deterministic TargetCloseError crashes
due to resource exhaustion.
Desired Behavior
Allow each processor to declare a max_jobs limit in rsconstruct.toml:
[processor.marp]
formats = ["pdf"]
max_jobs = 4
With -j 20, marp would run at most 4 concurrent jobs while other processors use the full 20.
max_jobs unset or 0 means “use the global -j value” (current behavior).
Implementation Plan
1. Add max_jobs field to processor configs
File: src/config/processor_configs.rs
Add to the generator_config! macro (all variants) and checker config structs:
#![allow(unused)]
fn main() {
#[serde(default)]
pub max_jobs: Option<usize>,
}
Add to Default impl (max_jobs: None) and KnownFields list.
2. Expose max_jobs() on the ProductDiscovery trait
File: src/processors/mod.rs
#![allow(unused)]
fn main() {
fn max_jobs(&self) -> Option<usize> { None }
}
Each processor implementation returns self.config.max_jobs.
3. Build a per-processor semaphore map in the executor
File: src/executor/mod.rs
Add to ExecutorOptions or build during executor construction:
#![allow(unused)]
fn main() {
pub processor_max_jobs: HashMap<String, usize>,
}
Constructed from the processor map by calling max_jobs() on each processor.
4. Use semaphores in the dispatch loop
File: src/executor/execution.rs (lines 177-203)
Create an Arc<Semaphore> per processor that has a max_jobs limit.
In the execution loop:
- Batch groups: If the processor has
max_jobs, the batch thread acquires a permit before executing each chunk, limiting concurrent Chromium (or similar) processes. - Non-batch items: Instead of dividing all non-batch items into
parallelchunks regardless of processor, group by processor first. Items from limited processors get their own chunking (min ofmax_jobsandparallel), others use globalparallel.
5. Config display
Ensure rsconstruct processors config marp and rsconstruct config show display
the max_jobs field.
Files to Modify
src/config/processor_configs.rs- addmax_jobsfield to macros and manual configssrc/processors/mod.rs- addmax_jobs()toProductDiscoverytraitsrc/processors/*.rs- implementmax_jobs()for each processorsrc/executor/mod.rs- add semaphore map toExecutorOptionssrc/executor/execution.rs- semaphore-based dispatch in the level loopsrc/builder/build.rs- build the processor limits map and pass to executor
Alternatives Considered
batch_sizeworkaround: Settingbatch_sizelimits items per batch invocation, but batch mode runs sequentially within one process, making it slow.- Global lower
-j: Works but penalizes lightweight processors unnecessarily.
Plugin Registry: Ecosystem Survey
rsconstruct uses a hand-built plugin registry where processors self-register
at link time via inventory::submit!, declare their config schema, and are
instantiated from TOML config at runtime. This page documents the search for
existing Rust crates that could replace this machinery.
What rsconstruct needs
The plugin system combines four responsibilities:
- Link-time self-registration — each processor file submits a plugin entry. No central list to maintain. Adding a processor = adding one file.
- Per-plugin TOML config — each plugin declares known fields, required
fields, defaults, and a
create(toml::Value) -> Box<dyn Processor>factory. The framework deserializes the matching[processor.NAME]section and passes it to the factory. - Defaults and validation — processor defaults, scan defaults, and output-dir defaults are applied in layers before deserialization. Unknown fields are rejected. Required fields are enforced.
- Name-to-factory mapping — the registry maps processor names to their
plugin entries for creation, introspection (
processors list), and config display.
Crates evaluated
inventory / linkme
The foundation rsconstruct already uses. inventory provides link-time
collection of typed values into a global iterator. linkme does the same
via distributed slices. Neither has any config awareness — they solve (1)
only.
- Verdict: already in use; does its job well.
spring-rs
The closest match conceptually. A Spring Boot-style Rust framework that
combines inventory-based plugin registration with TOML config via
#[derive(Configurable)] and #[config_prefix = "..."] attributes. Each
plugin declares its config struct with the derive macro, and the framework
auto-deserializes the matching TOML section.
However, spring-rs is a full application framework for web services (integrates axum, sqlx, OpenTelemetry, etc.). Pulling it in for a build tool would add a massive, opinionated dependency tree for ~50 lines of glue code savings.
- Verdict: right pattern, wrong scope. Not suitable.
config (crate)
Handles layered config loading from multiple sources (TOML, YAML, JSON, env vars) with type-safe deserialization. No plugin registration awareness at all — it’s a config library, not a plugin framework.
- Verdict: solves config layering, not plugin registration.
extism
A WebAssembly plugin runtime. Plugins are compiled to WASM and loaded at runtime with sandboxing. Completely different problem — runtime-loaded external plugins vs. compile-time self-registering internal plugins.
- Verdict: wrong problem domain.
plugin-interfaces
Designed for chat-client applications with FFI and inter-plugin messaging. Not relevant to build tools.
- Verdict: not applicable.
toml-cfg
Provides compile-time config macros (#[toml_cfg::toml_config]) that
embed config values from a TOML file at build time. No runtime registry,
no plugin awareness.
- Verdict: compile-time only; not what we need.
Conclusion
No existing crate provides the combination of link-time registration + per-plugin TOML config deserialization + defaults/validation + name-to-factory mapping. This is a genuine gap in the Rust ecosystem.
rsconstruct’s manual approach (~50 lines of glue in src/registries/processor.rs
using inventory::submit! + serde + the ProcessorPlugin struct) is the
standard Rust pattern for this. It is well-understood, has no external
framework dependency, and is unlikely to be improved upon by a third-party
crate without bringing in unrelated complexity.
Decision: keep the current hand-built registry. Revisit if a focused plugin-config crate emerges in the ecosystem.
Survey conducted: April 2026.
Rejected Audit Findings
Issues flagged during code audits (rounds 7-12) that were assessed and deliberately rejected. Documented here to prevent re-flagging in future audits.
Duration u128-to-u64 overflow in JSON output
File: src/json_output.rs (lines 130, 151)
Flagged in: rounds 9, 10, 11, 12
Duration::as_millis() returns u128, cast to u64 without bounds checking. Overflows after ~584 million years. No real build will ever hit this. Not fixing.
Pre-1970 mtime cache collision
File: src/object_store/checksums.rs (lines 25-27)
Flagged in: rounds 9, 10, 11, 12
Files with mtime before Unix epoch (1970) get unwrap_or_default() mapping to (0, 0). Two such files could share a cached mtime entry. Pre-1970 timestamps don’t occur on real build inputs. The mtime cache is only an optimization — the actual input checksum comparison catches real changes. Not fixing.
Dependency unchanged logic — no-dep products
File: src/executor/execution.rs (line 587)
Flagged in: round 9
Agent claimed !deps.is_empty() && deps.iter().all(...) should be deps.is_empty() || deps.iter().all(...). Wrong — products with no dependencies should NOT reuse cached checksums. The optimization is specifically for products whose upstream deps produced identical output, meaning transitive inputs are unchanged. No-dep products have no such guarantee.
Batch handle_success return value ignored
File: src/executor/execution.rs (line 339)
Flagged in: round 10
handle_success() return value is not checked in batch processing. This is correct — handle_success already calls record_failure internally when caching fails, properly marking the product as failed. In non-batch, the return value triggers a break from the retry loop, but batch has no retry loop. Stats are correct either way.
record_failure ignores mark_processor_failed in keep-going mode
File: src/executor/handlers.rs (lines 20-39)
Flagged in: rounds 11, 12
In keep-going mode, mark_processor_failed parameter is ignored. This is by design — failed_processors is only checked in non-keep-going mode to skip subsequent products from the same processor. In keep-going mode, all products run regardless, so tracking failed processors is unnecessary.
Arc reference leak — failed_processors not unwrapped
File: src/executor/execution.rs (collect_build_stats)
Flagged in: round 11
Agent claimed not unwrapping failed_processors Arc prevents other Arc::try_unwrap calls from succeeding. Wrong — each Arc has its own independent reference count. Not unwrapping one has zero effect on others.
Tera output paths lose directory structure
File: src/processors/generators/tera.rs (lines 100-106)
Flagged in: round 10
Templates in subdirectories produce output at project root (e.g., tera.templates/sub/README.md.tera → README.md). This is intentional — the comment on line 105 explicitly says “Output is at project root with the .tera extension stripped.” By design.
Lua stub_path uses suffix as directory name
File: src/processors/lua_processor.rs (line 126)
Flagged in: round 10
rsconstruct.stub_path(source, suffix) uses suffix to construct the output directory (out/{suffix}). This is the designed Lua API — plugins control their own output directory naming via the suffix parameter.
Lua clean count masking with saturating_sub
File: src/processors/lua_processor.rs (lines 450-468)
Flagged in: rounds 10, 11
Custom Lua clean functions report removal count via existed_before.saturating_sub(exist_after). If the Lua function doesn’t remove files, that’s the plugin’s responsibility. The count accurately reflects what was actually removed. Not a bug.
file_index src_exclude_dirs substring matching
File: src/file_index.rs (lines 76-80)
Flagged in: rounds 9, 10
src_exclude_dirs uses path_str.contains(dir) for filtering. The documented convention uses slash-delimited patterns like "/kernel/", which prevents false positives on path substrings. This is the configured behavior.
Object store trim path reconstruction
File: src/object_store/management.rs (lines 86-103)
Flagged in: round 9
Reconstructing checksums from filesystem paths (prefix + rest) during cache trim. The path structure is fixed (objects/[2-char]/[rest]), set by store_object(). Unexpected files in the objects directory are silently ignored during trim, which is the correct behavior.
Partial output caching (before the fix)
File: src/object_store/operations.rs (lines 144-147)
Flagged in: round 9
Originally flagged as a design choice. User overruled — missing outputs are now an error (anyhow::ensure!). This was accepted and fixed in a later commit, not rejected.
Zspell read-modify-write race
File: src/processors/checkers/zspell.rs (lines 192-229)
Flagged in: round 11
Agent claimed file read-modify-write isn’t protected. Wrong — self.words_to_add.lock() on line 193 acquires the mutex, which is held for the entire function (not dropped until return). The lock prevents concurrent threads from interleaving. Cross-process races are not a concern for RSConstruct.
Duplicate dependency edges in resolve_dependencies
File: src/graph.rs (lines 227-230)
Flagged in: round 12
Agent claimed duplicate edges cause incorrect topological sort. The scenario requires a product to list the same input file twice, which doesn’t happen — FileIndex.scan() returns unique paths. Even if it did, duplicate edges would increment and decrement in_degree the same number of times, netting out correctly.
Python string injection in load_python_config
File: src/processors/generators/tera.rs (lines 205-208)
Flagged in: round 12
Agent claimed newlines in file paths could inject Python code. File paths come from FileIndex (filesystem scan) or Tera templates written by the project author — both are trusted input. Linux file paths from filesystem scans don’t contain newlines.
Batch assert_eq should be error return
File: src/executor/execution.rs (lines 323-325)
Flagged in: round 12
Agent suggested replacing assert_eq! with anyhow::bail! for batch result count validation. The assert is deliberate — a processor returning the wrong number of results is a contract violation (programming error), not a recoverable runtime condition. Assertions are appropriate for invariant violations.
Platform portability (Windows, macOS)
Flagged in: rounds 9, 10, 11, 12
Multiple agents flagged std::os::unix usage without #[cfg(unix)] guards, and missing #[cfg(windows)]/#[cfg(target_os = "macos")] blocks. RSConstruct is Linux-only. No platform compatibility code will be added.
DB recovery — file might not exist
File: src/db.rs
Flagged in: round 12
Agent re-flagged db.rs recovery, claiming fs::remove_file could fail if the file doesn’t exist. This was already fixed in round 8 — let _ = fs::remove_file() was changed to fs::remove_file()? which properly propagates errors.
Suggestions
Ideas for future improvements, organized by category. Completed items have been moved to suggestions-done.md.
Grades:
- Urgency:
high(users need this),medium(nice to have),low(speculative/future) - Complexity:
low(hours),medium(days),high(weeks+)
Build Execution
Distributed builds
- Run builds across multiple machines, similar to distcc or icecream for C/C++.
- A coordinator node distributes work to worker nodes, each running rsconstruct in worker mode.
- Workers execute products and return outputs to the coordinator, which caches them locally.
- Challenges: network overhead for small products, identical tool versions across workers, local filesystem access.
- Urgency: low | Complexity: high
Sandboxed execution
- Run each processor in an isolated environment where it can only access its declared inputs.
- Prevents accidental undeclared dependencies.
- On Linux, namespaces can provide lightweight sandboxing.
- Urgency: low | Complexity: high
Content-addressable outputs (unchanged output pruning)
- Hash outputs too to skip downstream rebuilds when an input changes but produces identical output.
- Bazel calls this “unchanged output pruning.”
- Urgency: medium | Complexity: medium
Persistent daemon mode
- Keep rsconstruct running as a background daemon to avoid startup overhead.
- Benefits: instant file index via inotify, warm Lua VMs, connection pooling, faster incremental builds.
- Daemon listens on Unix socket (
.rsconstruct/daemon.sock). rsconstruct watchbecomes a client that triggers rebuilds on file events.- Urgency: low | Complexity: high
Persistent workers
- Keep long-running tool processes alive to avoid startup overhead.
- Instead of spawning
rufforpylintper invocation, keep one process alive and feed it files. - Bazel gets 2-4x speedup for Java this way. Could benefit pylint/mypy which have heavy startup.
- Multiplex variant: multiple requests to a single worker process via threads.
- Urgency: medium | Complexity: high
Dynamic execution (race local vs remote)
- Start both local and remote execution of the same product; use whichever finishes first and cancel the other.
- Useful when remote cache is slow or flaky.
- Configurable per-processor via execution strategy.
- Urgency: low | Complexity: high
Execution strategies per processor
- Map each processor to an execution strategy: local, remote, sandboxed, or dynamic.
- Different processors may benefit from different execution models.
- Config:
[processor.ruff] execution = "remote",[processor.cc_single_file] execution = "sandboxed". - Urgency: low | Complexity: medium
Build profiles
- Named configuration sets for different build scenarios (ci, dev, release).
- Profiles inherit from base configuration and override specific values.
- Usage:
rsconstruct build --profile=ci - Urgency: medium | Complexity: medium
Conditional processors
- Enable or disable processors based on conditions (environment variables, file existence, git branch, custom commands).
- Multiple conditions can be combined with
all/anylogic. - Urgency: low | Complexity: medium
Target aliases
- Define named groups of processors for easy invocation.
- Usage:
rsconstruct build @lint,rsconstruct build @test - Special aliases:
@all,@changed,@failed - File-based targeting:
rsconstruct build src/main.c - Urgency: medium | Complexity: medium
Graph & Query
Build graph query language
- Support queries like
rsconstruct query deps out/foo,rsconstruct query rdeps src/main.c,rsconstruct query processor:ruff. - Useful for debugging builds and CI systems that want to build only affected targets.
- Urgency: low | Complexity: medium
Affected analysis
- Given changed files (from
git diff), determine which products are affected and only build those. - Useful for large projects where a full build is expensive.
- Urgency: medium | Complexity: medium
Critical path analysis
- Identify the longest sequential chain of actions in a build.
- Helps users optimize their slowest builds by showing what’s actually on the critical path.
- Display with
rsconstruct build --critical-pathor include in--timingsoutput. - Urgency: medium | Complexity: medium
Extensibility
Plugin registry
- A central repository of community-contributed Lua plugins.
- Install with
rsconstruct plugin install eslint. - Registry could be a GitHub repository with a JSON index.
- Version pinning in
rsconstruct.toml. - Urgency: low | Complexity: high
Project templates
- Initialize new projects with pre-configured processors and directory structure.
rsconstruct init --template=python,rsconstruct init --template=cpp, etc.- Custom templates from local directories or URLs.
- Urgency: low | Complexity: medium
Rule composition / aspects
- Attach cross-cutting behavior to all targets of a certain type (e.g., “add coverage analysis to every C++ compile”).
- Urgency: low | Complexity: high
Output groups / subtargets
- Named subsets of a target’s outputs that can be requested selectively.
- E.g.,
rsconstruct build --output-group=debugor per-product subtarget selection. - Useful for targets that produce multiple output types (headers, binaries, docs).
- Urgency: low | Complexity: medium
Visibility / access control
- Restrict which processors can consume which files or directories.
- Prevents accidental cross-boundary dependencies in large repos.
- Config: per-processor
visibilityrules or directory-level.rsconstruct-visibilityfiles. - Urgency: low | Complexity: medium
Developer Experience
Build Event Protocol / structured event stream
- rsconstruct already has
--jsonon stdout with JSON Lines events (BuildEvent, ProductStart, ProductComplete, BuildSummary) and--tracefor Chrome trace format. - A proper Build Event Protocol (file or gRPC stream) would enable external dashboards, CI integrations, and build analytics services beyond what JSON Lines provides.
- Write events to a file (
--build-event-log=events.pb) or stream to a remote service. - Richer event types: action graph, configuration, progress, test results.
- Urgency: medium | Complexity: medium
Build notifications
- Desktop notifications when builds complete, especially for long builds.
- Platform-specific:
notify-send(Linux),osascript(macOS). - Config:
notify = true,notify_on_success = false. - Urgency: low | Complexity: low
Parallel dependency analysis
- The cpp analyzer scans files sequentially, which can be slow for large codebases.
- Parallelize header scanning using rayon or tokio.
- Urgency: low | Complexity: medium
IDE / LSP integration
- Language Server Protocol server for IDE integration.
- Features: diagnostics, code actions, hover info, file decorations.
- Plugins for VS Code, Neovim, Emacs.
- Urgency: low | Complexity: high
Build log capture
- Save stdout/stderr from each product execution to a log file.
- Config:
log_dir = ".rsconstruct/logs",log_retention = 10. rsconstruct log ruff:main.pyto view logs.- Urgency: low | Complexity: medium
Build timing history
- Store timing data to
.rsconstruct/timings.jsonafter each build. rsconstruct timingsshows slowest products, trends, time per processor.- Urgency: low | Complexity: medium
Remote cache authentication
- S3 and HTTP/HTTPS remote caches are already supported.
- Still needed: explicit bearer token support, GCS backend, and environment variable substitution for secrets in config.
- Urgency: medium | Complexity: medium
rsconstruct lint — Run only checkers
- Convenience command to run only checker processors.
- Equivalent to
rsconstruct build -p ruff,pylint,...but shorter. - Urgency: low | Complexity: low
Watch mode keyboard commands
- During
rsconstruct watch, supportr(rebuild),c(clean),q(quit),Enter(rebuild now),s(status). - Only activate when stdin is a TTY.
- Urgency: low | Complexity: medium
Layered config files
- Support config file layering: system (
/etc/rsconstruct/config.toml), user (~/.config/rsconstruct/config.toml), project (rsconstruct.toml). - Lower layers provide defaults, higher layers override.
- Per-command overrides via
[build],[watch]sections. - Similar to Bazel’s
.bazelrclayering. - Urgency: low | Complexity: low
Test sharding
- Split large test targets across multiple parallel shards.
- Set
TEST_TOTAL_SHARDSandTEST_SHARD_INDEXenvironment variables for test runners. - Config:
shard_count = 4per processor or product. - Useful for pytest/doctest processors when added.
- Urgency: low | Complexity: medium
Runfiles / runtime dependency trees
- Track runtime dependencies (shared libs, config files, data files) separately from build dependencies.
- Generate a runfiles directory per executable with symlinks to all transitive runtime deps.
- Useful for deployment, packaging, and containerization.
- Urgency: low | Complexity: high
On-demand processors (build_by_default = false)
- Today every declared processor runs on every
rsconstruct build. The only per-invocation escape hatches are-x name(remember every time) orenabled = falsein the config (remember to flip back). Neither fits the “this processor exists, don’t run it unless I ask” use case — common for slow lifecycle processors likepython_package,docker_build,publish,release_tarball. - Add a per-processor boolean field defaulting to true:
build_by_default = falseon a processor means it’s discovered and classified like any other, but its products are filtered out of the default run. - Prior art: meson’s
build_by_default: false, Bazel’stags = ["manual"], buck2’stags = ["manual"]. All use the same shape — declarative opt-out on the rule, per-invocation opt-in via target naming. - CLI semantics map cleanly onto existing
-p/-xmachinery:rsconstruct build→ excludesbuild_by_default = falseprocessors (new behaviour).rsconstruct build -p python_package→ includes onlypython_package; the-pexplicit inclusion overrides the default-off flag.rsconstruct build -p ruff,python_package→ includes both, including the opt-in one.rsconstruct build --all(new flag) → includes everything including on-demand processors. Useful for CI that wants to verify the opt-in path doesn’t bitrot.
- Example config:
[processor.python_package] build_by_default = false src_dirs = ["."] - Design considerations:
@allmeta-shortcut: the existing@checkers/@generatorsaliases should continue to mean “all of that type, subject to the default-off filter.” Users who want “all checkers including on-demand ones” would sayrsconstruct build --all -p @checkers— rare enough that the composition is fine.- Error on contradiction:
-p X -x Xalready errors;-p Xwhere X hasbuild_by_default = falseshould just work (explicit opt-in wins over declarative opt-out). - Watch mode:
rsconstruct watchshould honour the same default — don’t rebuild the package processor on every file save. Users who want watch-mode packaging can add-p python_packageto the watch invocation. - Discovery cost: on-demand processors still run discovery every build, because we need to know what their products would be (for output-conflict detection, graph completeness, and
--allsupport). This is negligible — discovery is O(files matched), not O(cost of running).
- Follow-up idea: named goals (meson-style aggregated targets or npm-style scripts) for the “I want a lint goal / deploy goal / ci goal” pattern. That’s Pattern B, layered above per-processor config — not needed to solve the basic on-demand case.
- Urgency: medium | Complexity: low
Decomposed cache key for richer --explain
- Today every product has a single descriptor key that mixes input checksum + config hash + tool-version hash + variant. A miss tells us “the key changed” but not which component.
--explaincan only sayBUILD (no cache entry)/BUILD (output missing)— not “your cflags changed” or “an input file changed”. - Store the three sub-hashes (input, config, tool) in a new redb table keyed by stable product identity —
(processor_iname, primary_path)whereprimary_pathis the first output for generators or the first input for checkers. - Schema:
product_components: (processor, primary_path) -> { input_hash, config_hash, tool_hash, timestamp }. ~100 bytes per product, so ~500KB extra disk for a 5000-product project. - Reads only on
--explain.classify_productsalready routes throughexplain_descriptor; extend that to look up the prior components row, recompute current components, diff the three, and return a richer reason likeBUILD (config changed: cflags, include_paths). - Writes only when explicitly tracking. Two reasonable gates:
- Option A (single flag):
--explainenables both write and read. CI runs without--explain→ zero overhead. Trade-off: the first explain run after enabling has no prior row → reports “no prior state” generically. Subsequent runs work fully. - Option B (separate
--track-changes/[build] track_changes = true): decouples capture from query. CI omits the flag → zero overhead. Devs opt in permanently via config. - Lean Option A: fewer flags, the existing
--explaincarries both ends of the lifecycle, and CI/CD pays nothing by default since neither flag is set.
- Option A (single flag):
- Tier 1 only. Says “input bucket changed” but not which file. For a
.ccfile with 100 headers, the user still doesn’t know which header. A future Tier 2 (per-input-file checksums) would resolve that at ~5-10x storage cost; defer until users ask. - Caveats: adds a third source of truth (alongside
descriptorsand the in-memory graph) to keep in sync. Stale entries (products dropped from config) accumulate harmlessly untilcache clear. - Urgency: medium | Complexity: medium
Caching & Performance
Deferred materialization
- Don’t write cached outputs to disk until they’re actually needed by a downstream product.
- Urgency: low | Complexity: high
Garbage collection policy
- Time-based or size-based cache policies: “keep cache under 1GB” or “evict entries older than 30 days.”
- Config:
max_size = "1GB",max_age = "30d",gc_policy = "lru". rsconstruct cache gcfor manual garbage collection.- Urgency: low | Complexity: medium
Shared cache across branches
- Surface in
rsconstruct statuswhen products are restorable from another branch. - Already works implicitly via input hash matching.
- Urgency: low | Complexity: low
Merkle tree input hashing
- Hash inputs as a Merkle tree rather than flat concatenation.
- More efficient for large input sets — changing one file only rehashes its branch, not all inputs.
- Also enables efficient transfer of input trees to remote execution workers.
- Urgency: low | Complexity: medium
Reproducibility
Hermetic builds
- Control all inputs beyond tool versions: isolate env vars, control timestamps, sandbox network, pin system libraries.
- Config:
hermetic = true,allowed_env = ["HOME", "PATH"]. - Verification:
rsconstruct build --verifybuilds twice and compares outputs. - Urgency: low | Complexity: high
Determinism verification
rsconstruct build --verifymode that builds each product twice and compares outputs.- Urgency: low | Complexity: medium
CI & Reporting
CI config generator
rsconstruct ci generateoutputs a GitHub Actions or GitLab CI config that runs the build.- Detects enabled processors and required tools, generates install steps and build commands.
- Supports
--format=github|gitlab|circleci. - Urgency: medium | Complexity: medium
HTML build report
- Generate a visual HTML dashboard of build times, cache hit rates, and processor statistics.
rsconstruct build --report=build.htmlorrsconstruct report.- Include charts for timing trends, per-processor breakdown, cache efficiency.
- Urgency: low | Complexity: medium
PR comment bot
- Post build results (pass/fail, timing, warnings) as a GitHub PR comment.
rsconstruct ci commentreads build output and posts via GitHub API.- Urgency: low | Complexity: medium
Content & Documentation
rsconstruct init --detect
rsconstruct smart autoalready scans and enables processors, but a dedicatedinit --detectcould go further.- Generate a complete
rsconstruct.tomlwith processor-specific config (src_dirs, extensions, tool paths). - Urgency: medium | Complexity: medium
rsconstruct fmt — Auto-format rsconstruct.toml
- Sort
[processor.*]sections alphabetically, align values, remove redundant defaults. - Urgency: low | Complexity: low
Cross-project term sync
- Automatically keep terms directories in sync across multiple repos.
- Could run as a daemon or a periodic CI job.
rsconstruct terms sync --repos=repo1,repo2or config-driven.- Urgency: low | Complexity: medium
Glossary generator
rsconstruct terms glossarygenerates a markdown glossary from the terms directory.- Optionally pulls definitions from context in the markdown files where terms are used.
- Urgency: low | Complexity: medium
Link checker processor
- Validate that URLs in markdown files are not broken (HTTP HEAD requests).
- Configurable timeout, retry, and allow/blocklist patterns.
- Cache results to avoid re-checking unchanged URLs.
- Urgency: medium | Complexity: medium
Image optimizer processor
- Compress and resize images referenced in markdown files.
- Uses tools like
optipng,jpegoptim,svgo. - Config: quality levels, max dimensions, output format.
- Urgency: low | Complexity: medium
HTML+JS compression and packaging
- Minify and bundle HTML, CSS, and JavaScript files for deployment.
- Could use tools like
terser(JS),csso(CSS),html-minifier(HTML). - Bundle multiple JS/CSS files into single outputs, generate source maps.
- Integrate with existing eslint/stylelint processors for a full web frontend pipeline.
- Urgency: medium | Complexity: medium
Processor Ecosystem
WASM processor plugins
- Beyond Lua, allow processors written in any language compiled to WebAssembly.
- Provides sandboxing, portability, and language flexibility.
- WASI for filesystem access within the sandbox.
- Urgency: low | Complexity: high
Processor marketplace / registry
- A central repository of community-contributed processor configs and Lua plugins.
- Install with
rsconstruct plugin install prettier. - Registry as a GitHub repository with a JSON index. Version pinning in
rsconstruct.toml. - Urgency: low | Complexity: high
Cleaning & Cache
Time-based cache purge
rsconstruct cache purge --older-than=7dto remove cache entries older than a given duration.- Currently only
cache clearexists which removes everything. - Walk the object store, check file mtimes, remove old entries.
- Urgency: medium | Complexity: low
Enhanced cache statistics
rsconstruct cache statscurrently shows minimal info.- Add: hit rate percentage, bytes saved vs rebuild time, per-processor breakdown, slowest processors.
- Helps users identify optimization opportunities.
- Urgency: medium | Complexity: medium
CLI & UX
Configuration
Environment variable expansion in config
- Allow
${env:HOME}or${env:CI}inrsconstruct.tomlto reference environment variables. - The variable substitution system already exists for
[vars]; extending it to env vars is natural. - Useful for CI/CD systems that pass secrets or paths via environment.
- Urgency: medium | Complexity: low
Per-processor batch size
- Each processor config has a
batchboolean, but batch size is global ([build] batch_size). - Different tools have different startup costs — fast tools benefit from large batches, slow tools from small ones.
- Add
batch_sizefield to individual processor configs, overriding the global default. - Urgency: medium | Complexity: low
Processor Ecosystem
Flake8 (Python linter)
- Many projects still use flake8 over ruff. Widely adopted.
- Checker processor using
flake8. Batch-capable. - Urgency: medium | Complexity: low
Security
Shell command execution from source file comments
EXTRA_*_SHELLdirectives execute arbitrary shell commands parsed from source file comments.- Document the security implications clearly.
- Urgency: medium | Complexity: low
Internal Cleanups
These are code-quality items surfaced by an architecture audit. Each is
localized; none block features. See architecture-observations.md for
larger structural items.
Consolidate processor discovery helpers
src/processors/mod.rsexposesdiscover_checker_products,discover_directory_products,checker_discover,checker_auto_detect,checker_auto_detect_with_scan_root,scan_or_skip— all similar, with subtle differences (some auto-applydep_auto, some don’t; some validate scan roots, some don’t).- Choosing the wrong helper is a silent correctness issue: a processor that picks
discover_checker_productswhen it neededchecker_discoverlosesdep_automerging and never finds out. - Collapse to one or two helpers with explicit flags for the variations. Document the contract each helper commits to.
- Urgency: medium | Complexity: low
Remove / complete remote_pull scaffold in ObjectStore
src/object_store/mod.rshas aremote_pullfield andtry_fetch_*helpers inoperations.rsthat nothing calls.- Either finish the feature (wire the fetch helpers into the classify path) or delete the scaffold. Unused public-ish surface rots.
- Urgency: low | Complexity: medium (complete) / low (delete)
Drop or use processor_type on ProcessorPlugin
src/registries/processor.rshasprocessor_typemarked#[allow(dead_code)]with a comment about a futureprocessors list --type=checkerfilter.- Either ship the filter or drop the field until it’s needed. Dead fields with comments accumulate.
- Urgency: low | Complexity: low
TOOLS registry is monolithic and unsorted
src/processors/mod.rshas ~170 entries in a static array mixing Python, Node, Ruby, Rust, Perl, System categories with no alphabetic ordering within groups.- Hard to find a tool when adding one; hard to audit for gaps (a tool with no install command makes
doctorsilently unhelpful). - Split per-runtime into separate files or sort alphabetically within a section. Add a unit test that every processor’s
required_tools()entries have a matchingTOOLSrow (this test exists — keep it; make the table easier to satisfy). - Urgency: low | Complexity: low
Centralize alias expansion
expand_aliasesinsrc/builder/build.rshandles@checkers/@generators/@toolname/ bare-name syntaxes. It’s called once for-pand once for-x. Any new alias shortcut has to be added there.- No duplication today, but the function is in
build.rsdespite being useful elsewhere (completion,processors list,analyzers used). Move to a dedicated module and make it the canonical expander. - Urgency: low | Complexity: low
Inconsistent error-handling idioms in processors
- Some processors use
anyhow::bail!, someanyhow::Context::with_context(), some construct custom messages. The coding-standards doc already calls forwith_contexton every I/O operation, but processor-level error shape varies. - Pick one idiom per category (tool-failure vs. config-error vs. internal-error) and retrofit. Makes
--jsonerror events more uniform too. - Urgency: low | Complexity: low
Config validation timing
- Unknown-field and must-field validation runs inside
Config::load, which is correct. However, some cross-field validations (e.g. “cc_single_file needs include_paths if compiling C++”) happen later during processor creation or build. - Either pull all semantic validation into
Config::load(sotoml checkcatches everything) or accept that semantic errors surface later and document which is which. - Urgency: low | Complexity: medium
products list CLI
- Users can run
rsconstruct graph show(full graph) orrsconstruct status(per-processor summary), but there’s no flat list of “here are every product that would execute, with its primary input and output.” - Add
rsconstruct products list(parallel toprocessors listandanalyzers used). Respects-p/-x/--targetfilters. - Urgency: low | Complexity: low
ProductTiming.start_offset not populated for batch execution
src/processors/mod.rsdefinesstart_offsetonProductTiming; it’s populated for non-batch execution but may be None for batch paths.- Trace visualizations (
--trace) look jagged or incomplete when batches are involved. - Urgency: low | Complexity: low
Completed Suggestions
Items from suggestions.md that have been implemented.
Completed Features
- Remote caching — See Remote Caching. Share build artifacts across machines via S3, HTTP, or filesystem.
- Lua plugin system — See Lua Plugins. Define custom processors in Lua without forking rsconstruct.
- Tool version locking —
rsconstruct tools locklocks and verifies external tool versions. Tool versions are included in cache keys. - JSON output mode —
--jsonflag for machine-readable JSON Lines output (build_start, product_start, product_complete, build_summary events). - Native C/C++ include scanner — Default
include_scanner = "native"uses regex-based scanning. Falls back toinclude_scanner = "compiler"(gcc -MM). --processorsflag —rsconstruct build -p tera,ruffandrsconstruct watch -p terafilter which processors run.- Colored diff on config changes — When processor config changes trigger rebuilds, rsconstruct shows what changed with colored diff output.
- Batch processing — ruff, pylint, shellcheck, zspell, mypy, and rumdl all support batch execution via
execute_batch(). - Progress bar — Uses
indicatifcrate. Progress bar sized to actual work (excludes instant skips), hidden in verbose/JSON mode. - Emit
ProductStartJSON events — Emitted before each product starts executing, pairs withProductCompletefor per-product timing. - mypy processor — Python type checking with mypy. Batch-capable. Auto-detects
mypy.inias extra input. - Explain commands —
--explainflag shows skip/restore/rebuild reasons for each product during build.
Completed Code Consolidation
- Collapsed
checker_config!macro variants — Merged@basic,@with_auto_inputs, and@with_linterinto two internal variants (@no_linterand@with_linter). - Added
batchfield to all manually-defined processor configs — All processor configs now supportbatch = falseto disable batching per-project. - Replaced trivial checker files with
simple_checker!macro — 25 trivial checkers reduced from ~35 lines each to 3-5 lines (~800 lines eliminated). - Unified
lint_files/check_filesnaming — All checkers now usecheck_filesconsistently. - Moved
should_processguard into macro — Addedguard: scan_rootbuilt-in toimpl_checker!, removed boilerplateshould_process()from 7 processors. - Simplified
KnownFields— Scan config fields auto-appended by validation layer viaSCAN_CONFIG_FIELDSconstant;KnownFieldsimpls only list their own fields. - Extracted
WordManagerfor spellcheck/aspell — Shared word-file management (loading, collecting, flushing, execute/batch patterns) inword_manager.rs.
Completed New Processors
- mypy — Python type checking using
mypy. Batch-capable. Config:checker,args,dep_inputs,scan. - yamllint — Lint YAML files using
yamllint.src/processors/checkers/yamllint.rs. - jsonlint — Validate JSON files for syntax errors.
src/processors/checkers/jsonlint.rs. - taplo (toml-lint) — Validate TOML files using
taplo.src/processors/checkers/taplo.rs. - markdownlint — Lint Markdown files for structural issues. Uses
mdlormarkdownlint-cli. - pandoc — Convert Markdown to other formats (PDF, HTML, EPUB). Generator processor.
- jinja2 — Render Jinja2 templates (
.j2) via Python jinja2 library.src/processors/generators/jinja2.rs. - black — Python formatting verification using
black --check.src/processors/checkers/black.rs. - rust_single_file — Compile single-file Rust programs to executables.
src/processors/generators/rust_single_file.rs. - sass — Compile SCSS/SASS files to CSS.
src/processors/generators/sass.rs. - protobuf — Compile
.protofiles to generated code usingprotoc.src/processors/generators/protobuf.rs. - pytest — Run Python test files with pytest.
src/processors/checkers/pytest.rs. - doctest — Run Python doctests via
python3 -m doctest.src/processors/checkers/doctest.rs.
Completed Test Coverage
- Ruff/pylint processor tests —
tests/processors/ruff.rsandtests/processors/pylint.rswith integration tests. - Make processor tests —
tests/processors/make.rswith Makefile discovery and execution tests. - All generator processor tests — Integration tests for all 14 previously untested generators: a2x, drawio, gem, libreoffice, markdown, marp, mermaid, npm, pandoc, pdflatex, pdfunite, pip, sphinx.
- All checker processor tests — Integration tests for all 5 previously untested checkers: ascii, aspell, markdownlint, mdbook, mdl.
Completed Caching & Performance
- Lazy file hashing (mtime-based) —
mtime_checkconfig (defaulttrue),fast_checksum()with MTIME_TABLE. Stores(path, mtime, checksum)tuples. Disable with--no-mtime. - Compressed cache objects — Optional zstd compression for
.rsconstruct/objects/. Config:compression = truein[cache]. Incompatible with hardlink restore (must userestore_method = "copy"). Checksums computed on original content for stable cache keys.
Completed Developer Experience
--quietflag —-q/--quietsuppresses all output except errors. Useful for CI scripts that only care about exit code.- Flaky product detection / retry —
--retry=Nretries failed products up to N times. Reports FLAKY (passed on retry) vs FAILED status in build summary. - Actionable error messages —
rsconstruct tools checkshows install hints for missing tools (e.g., “install with: pip install ruff”). - Build profiling / tracing —
--trace=file.jsongenerates Chrome trace format output viewable inchrome://tracingor Perfetto UI. rsconstruct build <target>— Build specific targets by name or pattern via--targetglob patterns and-d/--dirflags.rsconstruct why <file>/ Explain rebuilds —--explainflag shows why each product is skipped, restored, or rebuilt.rsconstruct doctor— Diagnose build environment: checks config, tools, and versions. Full implementation insrc/builder/doctor.rs.rsconstruct sloc— Source lines of code statistics with COCOMO effort/cost estimation.src/builder/sloc.rs.
Completed Quick Wins
- Batch processing for more processors — All checker processors that support multiple file arguments now use batching.
- Progress bar for long builds — Implemented with
indicatif, shows[elapsed] [bar] pos/len message. --processorsflag for build and watch — Filter processors with-pflag.- Emit
ProductStartJSON events — Wired up and emitted before execution. - Colored diff on config changes — Shows colored JSON diff when processor config changes.
Completed Features (v0.3.7)
RSCONSTRUCT_THREADSenv var — Set parallelism via environment variable instead of-j. Priority: CLI-j>RSCONSTRUCT_THREADS> configparallel.- Global
output_dirin[build]— Global output directory prefix (default:"out"). Processor defaults likeout/marpare remapped when the global is changed (e.g.,output_dir = "build"makes marp output tobuild/marp). Individual processors can still override theiroutput_direxplicitly. - Named processor instance output directories — When multiple instances of the same processor are declared (e.g.,
[processor.marp.slides]and[processor.marp.docs]), each instance defaults toout/{instance_name}(e.g.,out/marp.slides,out/marp.docs) instead of sharing the same output directory. - Named processor instance names in error reporting — When multiple instances of the same processor exist, error messages, build progress, and statistics use the full instance name (e.g.,
[pylint.core],[pylint.tests]). Single instances continue to use just the processor type name. processors configwithout config file —rsconstruct processors config <name>now works without anrsconstruct.toml, showing the default configuration (same asdefconfig).tags collectcommand —rsconstruct tags collectscans the tags database for tags that are not in the tag collection (tags_dir) and adds them to the appropriate.txtfiles. Key:value tags go to{key}.txt, bare tags go totags.txt.rsconstruct statusshows 0-file processors — Processors declared in the config that match no files are now shown instatusoutput and the--breakdownsummary, making it easy to spot misconfigured or unnecessary processors.smart remove-no-file-processors— New commandrsconstruct smart remove-no-file-processorsremoves[processor.*]sections fromrsconstruct.tomlfor processors that don’t match any files. Handles both single and named instances.cc_single_fileoutput_dir from config — Thecc_single_fileprocessor now reads its output directory from the configoutput_dirfield instead of hardcodingout/cc_single_file. This fixes named instances (e.g.,cc_single_file.gccandcc_single_file.clang) which previously collided on the same output directory.clean unknownrespects .gitignore —rsconstruct clean unknownnow skips gitignored files. Previously it disabled .gitignore handling, causing intentionally ignored files (IDE configs, virtualenvs,*.pyc, etc.) to be flagged as unknown. RSConstruct outputs are still correctly identified via the build graph, so nothing is missed. Use--no-gitignoreto include gitignored files.- Cross-processor dependencies (fixed-point discovery) — Generator outputs are now visible to downstream processors on the first build. Discovery runs in a fixed-point loop: after each pass, declared outputs are injected as virtual files into the FileIndex, and discovery re-runs until no new products are found. This means a generator that creates
.mdfiles can feed pandoc/tags/spell-checkers in a single build, without needing a second build.
Completed Architecture Refactors
- Config provenance tracking — Every config field now carries
FieldProvenance(UserToml with line number, ProcessorDefault, ScanDefault, OutputDirDefault, SerdeDefault).rsconstruct config showannotates every field with its source. Usestoml_edit::Documentfor span capture. BuildContextreplacing process globals — All mutable process globals moved intoBuildContext: the three processor globals (INTERRUPTED,RUNTIME,INTERRUPT_SENDER) and the three checksum globals (CACHE,MTIME_DB,MTIME_ENABLED). Threaded through theProcessortrait, executor, analyzers, remote cache, checksum functions, and deps cache. Signal handler usesArc<BuildContext>.BuildPolicytrait — Extracted from the executor.classify_productsdelegates per-product skip/restore/rebuild decisions to a&dyn BuildPolicy.IncrementalPolicyimplements the current logic. Future policies (dry-run, always-rebuild, time-windowed) are a single trait impl.ObjectStoredecomposition —mod.rssplit from 664 → 223 lines into focused submodules:blobs.rs(content-addressed storage),descriptors.rs(cache descriptor CRUD),restore.rs(restore/needs_rebuild/can_restore/explain).
Completed Features (latest)
rsconstruct status --json— JSON output with per-processor counts (up_to_date,restorable,stale,new,total,native) and totals. Activated by--jsonflag.- Selective processor cleaning —
rsconstruct clean outputs -p ruff,pylintcleans only those processors’ outputs. Without-p, cleans everything. - Prettier processor — Checker using
prettier --check. Batch-capable. Scans.js/.jsx/.ts/.tsx/.mjs/.cjs/.css/.scss/.less/.html/.json/.md/.yaml/.yml.src/processors/checkers/prettier.rs. - Bare
cleanrequires subcommand —rsconstruct cleannow errors with usage hint instead of silently defaulting toclean outputs. - Nondeterministic test race fix — Fixed TOCTOU race in
store_descriptorwhere parallel writers could getPermission denied. Now retries after forcing writable on first failure. - Suppress status line for non-build commands — The
Exited with SUCCESS/ERRORfooter only shows forbuild,watch, andclean. - Configurable graph validation — Four checks run after
resolve_dependencies(): (1) reject empty inputs (default on), (2) validate dep references (default on), (3) detect duplicate inputs within same processor (default off), (4) early cycle detection (default off). Config:[graph]section fieldsvalidate_empty_inputs,validate_dep_references,validate_duplicate_inputs,validate_early_cycles. - Checksum globals moved to BuildContext —
CACHE,MTIME_DB,MTIME_ENABLEDmoved fromsrc/checksum.rsstatics intoBuildContext.combined_input_checksum,checksum_fast,file_checksumall take&BuildContext. Completes the isolated-build-context story. rsconstruct fixcommand — Runs fixers (auto-format, auto-fix) on source files. Checkers declare fix capability viafix_subcommand/fix_prepend_argsonSimpleCheckerParams.processors listshows aFixcolumn. Supports-pfiltering, batch execution, and--json. Fix-capable processors: ruff, black, prettier, eslint, stylelint, standard, taplo, rumdl, markdownlint.processors search—rsconstruct processors search <query>searches by name, description, and keywords. All 91 processors have keywords covering language, tool category, file extensions, and ecosystem terms. Supports--jsonoutput.
TODO
StandardConfig refactoring (DONE)
All config structs now embed StandardConfig via #[serde(flatten)].
Cache cleanup
-
Remove old DB cache code:
CacheEntry,OutputEntry,get_entry,has_cache_entry,get_cached_input_checksum,CACHE_TABLE. These are legacy from the pre-descriptor system.has_cache_entry(used in status display to distinguish “stale” vs “new”) should use the descriptor system instead. ~80 lines of dead code. -
Remove
cache_key()method fromProduct. Only used byhas_cache_entryandremove_stale. Oncehas_cache_entryis migrated to descriptors, it may become fully unused. -
Split db.redb: the configs table (
CONFIGS_TABLE) is still in the same DB as the now-unused cache table. Give configs its own file (configs.redb), then deletedb.redbentirely.
Cache correctness
-
Implement
output_depends_on_input_nameflag. Documented indocs/src/cache.mdbut not implemented. Needed for processors that embed the input filename in their output (e.g., a// Generated from foo.cheader). Without it, renaming such a file would produce a cache hit with stale content. -
Write a test for identical content processed by different processors. Verify two different processors processing the same file get separate cache entries (the processor name is in the descriptor key).
Code consolidation
-
Inline single-use
namesconstants. 20+ constants inprocessors::namesare used in exactly one place each (their processor’snew()call). Inline them as string literals. -
Clean
processor_configs.rs. Still 2,100+ lines. Check for:- ClangTidyConfig is nearly identical to StandardConfig — could it become a type alias?
- Unused
default_*helper functions left over from cppcheck removal. - Other config structs that are structurally identical to StandardConfig.
Documentation
- Add
docs/src/processors/creator.md— per-processor documentation for the Creator processor, like the other processor docs.
Housekeeping
-
Remove the
tarlockfile entries. The crate was added and removed, butCargo.lockmay still reference it. -
Commit everything. There is a large amount of uncommitted work spanning:
- HasScanConfig trait elimination
- SimpleGenerator (14 generators collapsed to data-driven)
- Creator processor (new processor type with multi-dir caching)
- Cache redesign (descriptor-based, content-addressed keys, no DB for cache data)
- Checksum cache centralization (moved mtime logic to
checksum.rswith own DB) - MassGenerator → Creator type rename
ProcessorTypeenum with strum iterationprocessors typesCLI command--no-mtime-cacheCLI flag- Mandatory
supports_batchon all processors - Checker consolidation (5 checkers → SimpleChecker entries)
- Removed unused
dirscrate - New documentation: cache.md, checksum-cache.md, processor-types.md