Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Suggestions

Ideas for future improvements, organized by category. Completed items have been moved to suggestions-done.md.

Grades:

  • Urgency: high (users need this), medium (nice to have), low (speculative/future)
  • Complexity: low (hours), medium (days), high (weeks+)

Build Execution

Distributed builds

  • Run builds across multiple machines, similar to distcc or icecream for C/C++.
  • A coordinator node distributes work to worker nodes, each running rsconstruct in worker mode.
  • Workers execute products and return outputs to the coordinator, which caches them locally.
  • Challenges: network overhead for small products, identical tool versions across workers, local filesystem access.
  • Urgency: low | Complexity: high

Sandboxed execution

  • Run each processor in an isolated environment where it can only access its declared inputs.
  • Prevents accidental undeclared dependencies.
  • On Linux, namespaces can provide lightweight sandboxing.
  • Urgency: low | Complexity: high

Content-addressable outputs (unchanged output pruning)

  • Hash outputs too to skip downstream rebuilds when an input changes but produces identical output.
  • Bazel calls this “unchanged output pruning.”
  • Urgency: medium | Complexity: medium

Persistent daemon mode

  • Keep rsconstruct running as a background daemon to avoid startup overhead.
  • Benefits: instant file index via inotify, warm Lua VMs, connection pooling, faster incremental builds.
  • Daemon listens on Unix socket (.rsconstruct/daemon.sock).
  • rsconstruct watch becomes a client that triggers rebuilds on file events.
  • Urgency: low | Complexity: high

Persistent workers

  • Keep long-running tool processes alive to avoid startup overhead.
  • Instead of spawning ruff or pylint per invocation, keep one process alive and feed it files.
  • Bazel gets 2-4x speedup for Java this way. Could benefit pylint/mypy which have heavy startup.
  • Multiplex variant: multiple requests to a single worker process via threads.
  • Urgency: medium | Complexity: high

Dynamic execution (race local vs remote)

  • Start both local and remote execution of the same product; use whichever finishes first and cancel the other.
  • Useful when remote cache is slow or flaky.
  • Configurable per-processor via execution strategy.
  • Urgency: low | Complexity: high

Execution strategies per processor

  • Map each processor to an execution strategy: local, remote, sandboxed, or dynamic.
  • Different processors may benefit from different execution models.
  • Config: [processor.ruff] execution = "remote", [processor.cc_single_file] execution = "sandboxed".
  • Urgency: low | Complexity: medium

Build profiles

  • Named configuration sets for different build scenarios (ci, dev, release).
  • Profiles inherit from base configuration and override specific values.
  • Usage: rsconstruct build --profile=ci
  • Urgency: medium | Complexity: medium

Conditional processors

  • Enable or disable processors based on conditions (environment variables, file existence, git branch, custom commands).
  • Multiple conditions can be combined with all/any logic.
  • Urgency: low | Complexity: medium

Target aliases

  • Define named groups of processors for easy invocation.
  • Usage: rsconstruct build @lint, rsconstruct build @test
  • Special aliases: @all, @changed, @failed
  • File-based targeting: rsconstruct build src/main.c
  • Urgency: medium | Complexity: medium

Graph & Query

Build graph query language

  • Support queries like rsconstruct query deps out/foo, rsconstruct query rdeps src/main.c, rsconstruct query processor:ruff.
  • Useful for debugging builds and CI systems that want to build only affected targets.
  • Urgency: low | Complexity: medium

Affected analysis

  • Given changed files (from git diff), determine which products are affected and only build those.
  • Useful for large projects where a full build is expensive.
  • Urgency: medium | Complexity: medium

Critical path analysis

  • Identify the longest sequential chain of actions in a build.
  • Helps users optimize their slowest builds by showing what’s actually on the critical path.
  • Display with rsconstruct build --critical-path or include in --timings output.
  • Urgency: medium | Complexity: medium

Extensibility

Plugin registry

  • A central repository of community-contributed Lua plugins.
  • Install with rsconstruct plugin install eslint.
  • Registry could be a GitHub repository with a JSON index.
  • Version pinning in rsconstruct.toml.
  • Urgency: low | Complexity: high

Project templates

  • Initialize new projects with pre-configured processors and directory structure.
  • rsconstruct init --template=python, rsconstruct init --template=cpp, etc.
  • Custom templates from local directories or URLs.
  • Urgency: low | Complexity: medium

Rule composition / aspects

  • Attach cross-cutting behavior to all targets of a certain type (e.g., “add coverage analysis to every C++ compile”).
  • Urgency: low | Complexity: high

Output groups / subtargets

  • Named subsets of a target’s outputs that can be requested selectively.
  • E.g., rsconstruct build --output-group=debug or per-product subtarget selection.
  • Useful for targets that produce multiple output types (headers, binaries, docs).
  • Urgency: low | Complexity: medium

Visibility / access control

  • Restrict which processors can consume which files or directories.
  • Prevents accidental cross-boundary dependencies in large repos.
  • Config: per-processor visibility rules or directory-level .rsconstruct-visibility files.
  • Urgency: low | Complexity: medium

Developer Experience

Build Event Protocol / structured event stream

  • rsconstruct already has --json on stdout with JSON Lines events (BuildEvent, ProductStart, ProductComplete, BuildSummary) and --trace for Chrome trace format.
  • A proper Build Event Protocol (file or gRPC stream) would enable external dashboards, CI integrations, and build analytics services beyond what JSON Lines provides.
  • Write events to a file (--build-event-log=events.pb) or stream to a remote service.
  • Richer event types: action graph, configuration, progress, test results.
  • Urgency: medium | Complexity: medium

Build notifications

  • Desktop notifications when builds complete, especially for long builds.
  • Platform-specific: notify-send (Linux), osascript (macOS).
  • Config: notify = true, notify_on_success = false.
  • Urgency: low | Complexity: low

Parallel dependency analysis

  • The cpp analyzer scans files sequentially, which can be slow for large codebases.
  • Parallelize header scanning using rayon or tokio.
  • Urgency: low | Complexity: medium

IDE / LSP integration

  • Language Server Protocol server for IDE integration.
  • Features: diagnostics, code actions, hover info, file decorations.
  • Plugins for VS Code, Neovim, Emacs.
  • Urgency: low | Complexity: high

Build log capture

  • Save stdout/stderr from each product execution to a log file.
  • Config: log_dir = ".rsconstruct/logs", log_retention = 10.
  • rsconstruct log ruff:main.py to view logs.
  • Urgency: low | Complexity: medium

Build timing history

  • Store timing data to .rsconstruct/timings.json after each build.
  • rsconstruct timings shows slowest products, trends, time per processor.
  • Urgency: low | Complexity: medium

Remote cache authentication

  • S3 and HTTP/HTTPS remote caches are already supported.
  • Still needed: explicit bearer token support, GCS backend, and environment variable substitution for secrets in config.
  • Urgency: medium | Complexity: medium

rsconstruct lint — Run only checkers

  • Convenience command to run only checker processors.
  • Equivalent to rsconstruct build -p ruff,pylint,... but shorter.
  • Urgency: low | Complexity: low

Watch mode keyboard commands

  • During rsconstruct watch, support r (rebuild), c (clean), q (quit), Enter (rebuild now), s (status).
  • Only activate when stdin is a TTY.
  • Urgency: low | Complexity: medium

Layered config files

  • Support config file layering: system (/etc/rsconstruct/config.toml), user (~/.config/rsconstruct/config.toml), project (rsconstruct.toml).
  • Lower layers provide defaults, higher layers override.
  • Per-command overrides via [build], [watch] sections.
  • Similar to Bazel’s .bazelrc layering.
  • Urgency: low | Complexity: low

Test sharding

  • Split large test targets across multiple parallel shards.
  • Set TEST_TOTAL_SHARDS and TEST_SHARD_INDEX environment variables for test runners.
  • Config: shard_count = 4 per processor or product.
  • Useful for pytest/doctest processors when added.
  • Urgency: low | Complexity: medium

Runfiles / runtime dependency trees

  • Track runtime dependencies (shared libs, config files, data files) separately from build dependencies.
  • Generate a runfiles directory per executable with symlinks to all transitive runtime deps.
  • Useful for deployment, packaging, and containerization.
  • Urgency: low | Complexity: high

On-demand processors (build_by_default = false)

  • Today every declared processor runs on every rsconstruct build. The only per-invocation escape hatches are -x name (remember every time) or enabled = false in the config (remember to flip back). Neither fits the “this processor exists, don’t run it unless I ask” use case — common for slow lifecycle processors like python_package, docker_build, publish, release_tarball.
  • Add a per-processor boolean field defaulting to true: build_by_default = false on a processor means it’s discovered and classified like any other, but its products are filtered out of the default run.
  • Prior art: meson’s build_by_default: false, Bazel’s tags = ["manual"], buck2’s tags = ["manual"]. All use the same shape — declarative opt-out on the rule, per-invocation opt-in via target naming.
  • CLI semantics map cleanly onto existing -p/-x machinery:
    • rsconstruct build → excludes build_by_default = false processors (new behaviour).
    • rsconstruct build -p python_package → includes only python_package; the -p explicit inclusion overrides the default-off flag.
    • rsconstruct build -p ruff,python_package → includes both, including the opt-in one.
    • rsconstruct build --all (new flag) → includes everything including on-demand processors. Useful for CI that wants to verify the opt-in path doesn’t bitrot.
  • Example config:
    [processor.python_package]
    build_by_default = false
    src_dirs = ["."]
    
  • Design considerations:
    • @all meta-shortcut: the existing @checkers / @generators aliases should continue to mean “all of that type, subject to the default-off filter.” Users who want “all checkers including on-demand ones” would say rsconstruct build --all -p @checkers — rare enough that the composition is fine.
    • Error on contradiction: -p X -x X already errors; -p X where X has build_by_default = false should just work (explicit opt-in wins over declarative opt-out).
    • Watch mode: rsconstruct watch should honour the same default — don’t rebuild the package processor on every file save. Users who want watch-mode packaging can add -p python_package to the watch invocation.
    • Discovery cost: on-demand processors still run discovery every build, because we need to know what their products would be (for output-conflict detection, graph completeness, and --all support). This is negligible — discovery is O(files matched), not O(cost of running).
  • Follow-up idea: named goals (meson-style aggregated targets or npm-style scripts) for the “I want a lint goal / deploy goal / ci goal” pattern. That’s Pattern B, layered above per-processor config — not needed to solve the basic on-demand case.
  • Urgency: medium | Complexity: low

Decomposed cache key for richer --explain

  • Today every product has a single descriptor key that mixes input checksum + config hash + tool-version hash + variant. A miss tells us “the key changed” but not which component. --explain can only say BUILD (no cache entry) / BUILD (output missing) — not “your cflags changed” or “an input file changed”.
  • Store the three sub-hashes (input, config, tool) in a new redb table keyed by stable product identity — (processor_iname, primary_path) where primary_path is the first output for generators or the first input for checkers.
  • Schema: product_components: (processor, primary_path) -> { input_hash, config_hash, tool_hash, timestamp }. ~100 bytes per product, so ~500KB extra disk for a 5000-product project.
  • Reads only on --explain. classify_products already routes through explain_descriptor; extend that to look up the prior components row, recompute current components, diff the three, and return a richer reason like BUILD (config changed: cflags, include_paths).
  • Writes only when explicitly tracking. Two reasonable gates:
    • Option A (single flag): --explain enables both write and read. CI runs without --explain → zero overhead. Trade-off: the first explain run after enabling has no prior row → reports “no prior state” generically. Subsequent runs work fully.
    • Option B (separate --track-changes / [build] track_changes = true): decouples capture from query. CI omits the flag → zero overhead. Devs opt in permanently via config.
    • Lean Option A: fewer flags, the existing --explain carries both ends of the lifecycle, and CI/CD pays nothing by default since neither flag is set.
  • Tier 1 only. Says “input bucket changed” but not which file. For a .cc file with 100 headers, the user still doesn’t know which header. A future Tier 2 (per-input-file checksums) would resolve that at ~5-10x storage cost; defer until users ask.
  • Caveats: adds a third source of truth (alongside descriptors and the in-memory graph) to keep in sync. Stale entries (products dropped from config) accumulate harmlessly until cache clear.
  • Urgency: medium | Complexity: medium

Caching & Performance

Deferred materialization

  • Don’t write cached outputs to disk until they’re actually needed by a downstream product.
  • Urgency: low | Complexity: high

Garbage collection policy

  • Time-based or size-based cache policies: “keep cache under 1GB” or “evict entries older than 30 days.”
  • Config: max_size = "1GB", max_age = "30d", gc_policy = "lru".
  • rsconstruct cache gc for manual garbage collection.
  • Urgency: low | Complexity: medium

Shared cache across branches

  • Surface in rsconstruct status when products are restorable from another branch.
  • Already works implicitly via input hash matching.
  • Urgency: low | Complexity: low

Merkle tree input hashing

  • Hash inputs as a Merkle tree rather than flat concatenation.
  • More efficient for large input sets — changing one file only rehashes its branch, not all inputs.
  • Also enables efficient transfer of input trees to remote execution workers.
  • Urgency: low | Complexity: medium

Reproducibility

Hermetic builds

  • Control all inputs beyond tool versions: isolate env vars, control timestamps, sandbox network, pin system libraries.
  • Config: hermetic = true, allowed_env = ["HOME", "PATH"].
  • Verification: rsconstruct build --verify builds twice and compares outputs.
  • Urgency: low | Complexity: high

Determinism verification

  • rsconstruct build --verify mode that builds each product twice and compares outputs.
  • Urgency: low | Complexity: medium

CI & Reporting

CI config generator

  • rsconstruct ci generate outputs a GitHub Actions or GitLab CI config that runs the build.
  • Detects enabled processors and required tools, generates install steps and build commands.
  • Supports --format=github|gitlab|circleci.
  • Urgency: medium | Complexity: medium

HTML build report

  • Generate a visual HTML dashboard of build times, cache hit rates, and processor statistics.
  • rsconstruct build --report=build.html or rsconstruct report.
  • Include charts for timing trends, per-processor breakdown, cache efficiency.
  • Urgency: low | Complexity: medium

PR comment bot

  • Post build results (pass/fail, timing, warnings) as a GitHub PR comment.
  • rsconstruct ci comment reads build output and posts via GitHub API.
  • Urgency: low | Complexity: medium

Content & Documentation

rsconstruct init --detect

  • rsconstruct smart auto already scans and enables processors, but a dedicated init --detect could go further.
  • Generate a complete rsconstruct.toml with processor-specific config (src_dirs, extensions, tool paths).
  • Urgency: medium | Complexity: medium

rsconstruct fmt — Auto-format rsconstruct.toml

  • Sort [processor.*] sections alphabetically, align values, remove redundant defaults.
  • Urgency: low | Complexity: low

Cross-project term sync

  • Automatically keep terms directories in sync across multiple repos.
  • Could run as a daemon or a periodic CI job.
  • rsconstruct terms sync --repos=repo1,repo2 or config-driven.
  • Urgency: low | Complexity: medium

Glossary generator

  • rsconstruct terms glossary generates a markdown glossary from the terms directory.
  • Optionally pulls definitions from context in the markdown files where terms are used.
  • Urgency: low | Complexity: medium
  • Validate that URLs in markdown files are not broken (HTTP HEAD requests).
  • Configurable timeout, retry, and allow/blocklist patterns.
  • Cache results to avoid re-checking unchanged URLs.
  • Urgency: medium | Complexity: medium

Image optimizer processor

  • Compress and resize images referenced in markdown files.
  • Uses tools like optipng, jpegoptim, svgo.
  • Config: quality levels, max dimensions, output format.
  • Urgency: low | Complexity: medium

HTML+JS compression and packaging

  • Minify and bundle HTML, CSS, and JavaScript files for deployment.
  • Could use tools like terser (JS), csso (CSS), html-minifier (HTML).
  • Bundle multiple JS/CSS files into single outputs, generate source maps.
  • Integrate with existing eslint/stylelint processors for a full web frontend pipeline.
  • Urgency: medium | Complexity: medium

Processor Ecosystem

WASM processor plugins

  • Beyond Lua, allow processors written in any language compiled to WebAssembly.
  • Provides sandboxing, portability, and language flexibility.
  • WASI for filesystem access within the sandbox.
  • Urgency: low | Complexity: high

Processor marketplace / registry

  • A central repository of community-contributed processor configs and Lua plugins.
  • Install with rsconstruct plugin install prettier.
  • Registry as a GitHub repository with a JSON index. Version pinning in rsconstruct.toml.
  • Urgency: low | Complexity: high

Cleaning & Cache

Time-based cache purge

  • rsconstruct cache purge --older-than=7d to remove cache entries older than a given duration.
  • Currently only cache clear exists which removes everything.
  • Walk the object store, check file mtimes, remove old entries.
  • Urgency: medium | Complexity: low

Enhanced cache statistics

  • rsconstruct cache stats currently shows minimal info.
  • Add: hit rate percentage, bytes saved vs rebuild time, per-processor breakdown, slowest processors.
  • Helps users identify optimization opportunities.
  • Urgency: medium | Complexity: medium

CLI & UX

Configuration

Environment variable expansion in config

  • Allow ${env:HOME} or ${env:CI} in rsconstruct.toml to reference environment variables.
  • The variable substitution system already exists for [vars]; extending it to env vars is natural.
  • Useful for CI/CD systems that pass secrets or paths via environment.
  • Urgency: medium | Complexity: low

Per-processor batch size

  • Each processor config has a batch boolean, but batch size is global ([build] batch_size).
  • Different tools have different startup costs — fast tools benefit from large batches, slow tools from small ones.
  • Add batch_size field to individual processor configs, overriding the global default.
  • Urgency: medium | Complexity: low

Processor Ecosystem

Flake8 (Python linter)

  • Many projects still use flake8 over ruff. Widely adopted.
  • Checker processor using flake8. Batch-capable.
  • Urgency: medium | Complexity: low

Security

Shell command execution from source file comments

  • EXTRA_*_SHELL directives execute arbitrary shell commands parsed from source file comments.
  • Document the security implications clearly.
  • Urgency: medium | Complexity: low

Internal Cleanups

These are code-quality items surfaced by an architecture audit. Each is localized; none block features. See architecture-observations.md for larger structural items.

Consolidate processor discovery helpers

  • src/processors/mod.rs exposes discover_checker_products, discover_directory_products, checker_discover, checker_auto_detect, checker_auto_detect_with_scan_root, scan_or_skip — all similar, with subtle differences (some auto-apply dep_auto, some don’t; some validate scan roots, some don’t).
  • Choosing the wrong helper is a silent correctness issue: a processor that picks discover_checker_products when it needed checker_discover loses dep_auto merging and never finds out.
  • Collapse to one or two helpers with explicit flags for the variations. Document the contract each helper commits to.
  • Urgency: medium | Complexity: low

Remove / complete remote_pull scaffold in ObjectStore

  • src/object_store/mod.rs has a remote_pull field and try_fetch_* helpers in operations.rs that nothing calls.
  • Either finish the feature (wire the fetch helpers into the classify path) or delete the scaffold. Unused public-ish surface rots.
  • Urgency: low | Complexity: medium (complete) / low (delete)

Drop or use processor_type on ProcessorPlugin

  • src/registries/processor.rs has processor_type marked #[allow(dead_code)] with a comment about a future processors list --type=checker filter.
  • Either ship the filter or drop the field until it’s needed. Dead fields with comments accumulate.
  • Urgency: low | Complexity: low

TOOLS registry is monolithic and unsorted

  • src/processors/mod.rs has ~170 entries in a static array mixing Python, Node, Ruby, Rust, Perl, System categories with no alphabetic ordering within groups.
  • Hard to find a tool when adding one; hard to audit for gaps (a tool with no install command makes doctor silently unhelpful).
  • Split per-runtime into separate files or sort alphabetically within a section. Add a unit test that every processor’s required_tools() entries have a matching TOOLS row (this test exists — keep it; make the table easier to satisfy).
  • Urgency: low | Complexity: low

Centralize alias expansion

  • expand_aliases in src/builder/build.rs handles @checkers / @generators / @toolname / bare-name syntaxes. It’s called once for -p and once for -x. Any new alias shortcut has to be added there.
  • No duplication today, but the function is in build.rs despite being useful elsewhere (completion, processors list, analyzers used). Move to a dedicated module and make it the canonical expander.
  • Urgency: low | Complexity: low

Inconsistent error-handling idioms in processors

  • Some processors use anyhow::bail!, some anyhow::Context::with_context(), some construct custom messages. The coding-standards doc already calls for with_context on every I/O operation, but processor-level error shape varies.
  • Pick one idiom per category (tool-failure vs. config-error vs. internal-error) and retrofit. Makes --json error events more uniform too.
  • Urgency: low | Complexity: low

Config validation timing

  • Unknown-field and must-field validation runs inside Config::load, which is correct. However, some cross-field validations (e.g. “cc_single_file needs include_paths if compiling C++”) happen later during processor creation or build.
  • Either pull all semantic validation into Config::load (so toml check catches everything) or accept that semantic errors surface later and document which is which.
  • Urgency: low | Complexity: medium

products list CLI

  • Users can run rsconstruct graph show (full graph) or rsconstruct status (per-processor summary), but there’s no flat list of “here are every product that would execute, with its primary input and output.”
  • Add rsconstruct products list (parallel to processors list and analyzers used). Respects -p/-x/--target filters.
  • Urgency: low | Complexity: low

ProductTiming.start_offset not populated for batch execution

  • src/processors/mod.rs defines start_offset on ProductTiming; it’s populated for non-batch execution but may be None for batch paths.
  • Trace visualizations (--trace) look jagged or incomplete when batches are involved.
  • Urgency: low | Complexity: low