Lineage Graph

Visualizing dependencies and impact

[!WARNING] Status: Planned. The lineage graph is part of the governance layer currently under development. The design below describes the planned behavior.

The Lineage Graph will map the relationships between all artifacts in the ChooChoo ecosystem. It answers the question: "If I change this, what breaks?" The graph is a core component of The Map and will power the Impact Radius calculation used in Risk Scoring.

Node Types

  • product: A Data Product defined using the ODPS standard.
  • contract: A Data Contract defined using the ODCS standard.
  • workflow: An Arazzo workflow that orchestrates multi-step operations.
  • agent: An AI Agent registered in the Agent Registry.
  • human: A developer or approver tracked in the Audit Trail.

Edge Types

EdgeDescription
producesA Product outputs data via a Contract.
consumesA Product reads data from another Product.
depends-onA Product relies on another Product's uptime.
implementsAn API implements a Contract.
validatesA Workflow step validates against a Contract.

These relationships are automatically extracted during choochoo validate by resolving cross-references between artifacts. For example, when a Product's outputPorts reference a Contract, ChooChoo creates a produces edge. See Validation Rules for details on the cross-reference resolution process.

Impact Analysis

Before a change is applied, ChooChoo traverses the graph downstream to calculate the Impact Radius. This value is one of five factors in the Risk Scoring algorithm.

Example: If Product A changes its output contract, ChooChoo finds that Product B and Dashboard C consume that contract. The Impact Radius increases based on the number and criticality of these dependents. If any of the downstream artifacts contain fields with sensitive compliance tags (e.g., pii, financial), the risk score increases further.

A high Impact Radius can trigger approval workflows requiring human sign-off before the change proceeds. In CI/CD pipelines, this manifests as exit code 10 (APPROVAL_REQUIRED).

Querying the Graph

Use the choochoo lineage command to explore the graph from the command line:

# Show direct dependencies of an artifact
choochoo lineage show customer-360

# Show dependencies up to 3 levels deep
choochoo lineage show customer-360 --depth 3

# Output as JSON for scripting
choochoo lineage show customer-360 --json

Circular dependencies (A → B → A) are detected during validation and produce error E005 (Circular dependency detected).

Graph in The Station

The Station provides an interactive visualization of the Lineage Graph, allowing GRC teams to:

  • Explore entity relationships visually
  • Click through to Audit Trail entries for any node
  • View Risk Heatmaps overlaid on the graph
  • Identify high-impact artifacts that affect many downstream consumers
  • Filter by compliance tags to focus on regulated data flows

Building the Graph

The graph is built incrementally as artifacts are validated and Decision Traces are recorded. Each trace links an Agent or human actor to the artifacts they modified, creating actor-to-artifact edges alongside the artifact-to-artifact relationships defined in the specs.

For the graph to be accurate, your project structure must follow the standard layout and all cross-references between artifacts must resolve correctly. See File Structure for the expected directory layout and naming conventions.

On this page