Cross-Processor Dependencies
This chapter discusses the problem of one processor’s output being consumed as input by another processor, and the design options for solving it.
The Problem
Consider a template that generates a Python file:
tera.templates/config.py.tera → (template processor) → config.py
Ideally, ruff should then lint the generated config.py. Or a template might
generate a C++ source file that needs to be compiled by cc_single_file and
linted by cppcheck. Chains can be arbitrarily deep:
template → generates foo.sh → shellcheck lints foo.sh
template → generates bar.c → cc_single_file compiles bar.c → cppcheck lints bar.c
Currently this does not work. Each processor discovers its inputs by querying
the FileIndex, which is built once at startup by scanning the filesystem.
Files that do not exist yet (because they will be produced by another processor)
are invisible to downstream processors. No product is created for them, and no
dependency edge is formed.
Why It Breaks
The build pipeline today is:
- Walk the filesystem once to build
FileIndex - Each processor runs
discover()against that index resolve_dependencies()matches product inputs to product outputs by path- Topological sort and execution
Step 3 already handles cross-processor edges correctly: if product A declares
output foo.py and product B declares input foo.py, a dependency edge from
A to B is created automatically. The problem is that step 2 never creates
product B in the first place, because foo.py is not in the FileIndex.
How Other Build Systems Handle This
Bazel
Bazel uses BUILD files where rules explicitly declare their inputs and outputs.
Dependencies are specified by label references, not by filesystem scanning.
However, Bazel does use glob() to discover source files during its loading
phase. The key insight is that during the analysis phase, both source files
(from globs) and generated files (from rule declarations) are visible in a
unified view. A rule’s declared outputs are known before any action executes.
Buck2
Buck2 takes a similar approach with a single unified dependency graph (no
separate phases). Rules call declare_output() to create artifact references
and return them via providers. Downstream rules receive these references through
their declared dependencies. For cases where the dependency structure is not
known statically, Buck2 provides dynamic_output — a rule can read an artifact
at build time to discover additional dependencies.
Common Pattern
In both systems, the core principle is the same: a rule’s declared outputs are visible to the dependency resolver before execution begins. The dependency graph is fully resolved at analysis time.
Proposed Solutions
A. Multi-Pass Discovery (Iterative Build-Scan Loop)
Run discovery, build what is ready, re-scan the filesystem, discover again. Repeat until nothing new is found.
- Pro: Simple mental model, handles arbitrary chain depth
- Con: Slow (re-scans filesystem each pass), hard to detect infinite loops, execution is interleaved with discovery
B. Virtual Files from Declared Outputs (Two-Pass)
After the first discovery pass, collect all declared outputs from the graph and inject them as “virtual files” visible to processors. Run discovery a second time so downstream processors can find the generated files.
- Pro: No filesystem re-scan, single build execution phase, deterministic
- Con: Limited to chains of depth 1 (producer → consumer). A three-step chain (template → compile → lint) would require three passes, making the fixed two-pass design insufficient.
C. Fixed-Point Discovery Loop
Generalization of Approach B. Run discovery in a loop: after each pass, collect newly declared outputs and feed them back as known files for the next pass. Stop when a full pass adds no new products. Add a maximum iteration limit to catch cycles.
known_files = FileIndex (real files on disk)
loop {
run discover() for all processors, with known_files visible
new_outputs = outputs declared in this pass that were not in known_files
if new_outputs is empty → break
known_files = known_files + new_outputs
}
resolve_dependencies()
execute()
A chain of depth N requires N iterations. Most projects would converge in 1-2 iterations.
- Pro: Fully general, handles arbitrary chain depth, no filesystem re-scan, deterministic, path-based matching (no reliance on file extensions)
- Con: Processors must be able to discover products for files that do not exist on disk yet (they only know the path). This works for stub-based processors and compilers but might be an issue for processors that inspect file contents during discovery.
D. Explicit Cross-Processor Wiring in Config
Let users declare chains in rsconstruct.toml:
[[pipeline]]
from = "template"
to = "ruff"
rsconstruct then knows that template outputs matching ruff’s scan configuration should become ruff inputs.
- Pro: Explicit, no magic, user controls what gets chained
- Con: More configuration burden, loses the “convention over configuration” philosophy
E. Make out/ Visible to FileIndex
The simplest mechanical fix: stop excluding out/ from the FileIndex. Since
.gitignore contains /out/, the ignore crate skips it. This could be
overridden in the WalkBuilder configuration.
- Pro: Minimal code change, works on subsequent builds (files already exist from previous build)
- Con: Does not work on the first clean build (files do not exist yet). Processors would also see stale outputs from deleted processors, and stub files from other processors (though extension filtering would exclude most of these).
F. Two-Phase Processor Trait (Declarative Forward Tracing)
Split the ProductDiscovery trait so that each processor can declare what
output paths it would produce for a given input path, without performing full
discovery:
#![allow(unused)]
fn main() {
trait ProductDiscovery {
/// Given an input path, return the output paths this processor would
/// produce. Called even for files that don't exist on disk yet.
fn would_produce(&self, input_path: &Path) -> Vec<PathBuf>;
/// Full discovery (as today)
fn discover(&self, graph: &mut BuildGraph, file_index: &FileIndex) -> Result<()>;
// ...
}
}
The build system first runs discover() on all processors to get the initial
set of products and their outputs. Then, for each declared output, it calls
would_produce() on every other processor to trace the chain forward. This
repeats transitively until no new outputs are produced. Finally, discover()
runs once more with the complete set of known paths (real + virtual).
Unlike Approach C, this does not require a loop over full discovery passes. The chain is traced declaratively by asking each processor “if this file existed, what would you produce from it?” — a lightweight query that does not modify the graph.
- Pro: Single discovery pass plus lightweight forward tracing. No loop, no convergence check, no iteration limit. Each processor defines its output naming convention in one place. The full transitive closure of outputs is known before the main discovery runs.
- Con: Adds a method to the
ProductDiscoverytrait that every processor must implement. Some processors have complex output path logic (e.g.,cc_single_filechanges the extension and directory), sowould_produce()must replicate that logic — meaning the output path computation exists in two places (inwould_produce()and indiscover()). Keeping these in sync is a maintenance risk.
G. Hybrid: Visible out/ + Fixed-Point Discovery
Combine Approach E (make out/ visible) with Approach C (fixed-point loop) or
Approach F (forward tracing).
On subsequent builds, existing files in out/ are already in the index. On
clean builds, the fixed-point loop discovers them from declared outputs.
- Pro: Most robust — works for both clean and incremental builds
- Con: Combines complexity of two approaches, risk of discovering stale outputs
Recommendation
Approach C (fixed-point discovery loop) is the most principled solution. It is fully general, handles arbitrary chain depth, requires no configuration, and matches the core insight from Bazel and Buck2: declared outputs should be visible during dependency resolution before execution begins.
The main implementation requirement is extending the FileIndex (or creating a
wrapper) to accept “virtual” entries for paths that are declared as outputs but
do not yet exist on disk. Processors already declare their outputs during
discover(), so the information needed to populate these virtual entries is
already available.
Current Status
Cross-processor dependencies are not yet implemented. The dependency graph
machinery (resolve_dependencies(), topological sort, executor ordering) is
correct and would handle cross-processor edges properly once downstream products
are discovered. The gap is purely in the discovery phase.