Cost-cascade

Description

A conditional cascade where a cheap-default path handles the common case and an expensive-fallback path handles the remainder — with the switch between them conditioned on the outcome of the cheap path (failure, low confidence, threshold not met). The cascade’s value comes from the cost differential: if the expensive path were always used, it would either be rate-limited, budget-exhausted, or simply too slow; the cheap path acts as a filter that directs only the hard cases to the expensive path.

Cost-cascade differs from rivals-into-router in that the routing condition is defensive rather than proactive: the cheap path runs first and the expensive path is invoked because the cheap path fell short, not because a routing signal predicted the expensive path was appropriate. The cascade is sequential (cheap, then conditionally expensive); rivals-into-router is parallel dispatch based on predicted fit.

The form also differs from plain asymmetric-gate: asymmetric-gate is a boundary with differential cost in each direction; cost-cascade is a pipeline where stages have increasing cost and each stage gates entry to the next.

Composition

= gradient (the cost/quality tradeoff dimension) + asymmetric-gate (the threshold that determines whether to escalate) + stack-layer (the cheap layer sitting below the expensive layer).

The gradient identifies the quality-vs-cost dimension along which the cascade operates. The asymmetric-gate sets the escalation condition (cheap path below threshold → escalate). Stack-layer captures the layered structure: cheap layer handles the common case; expensive layer is invoked only when the cheap layer’s output is insufficient.

Encounters

MAC→FAC in the engine — MAC (embedding retrieval, cheap) → FAC structural alignment (subagent dispatch, expensive). MAC retrieves top-k candidates; FAC aligns only those. The cascade is: all queries → MAC; MAC top-k → FAC. This is cost-cascade rather than rivals-into-router because the escalation condition is “MAC returned candidates” (not a predicted-fit signal).
Gemini Flash → Pro — Flash handles common-path queries (fast, cheap); Pro handles escalated queries (slower, expensive). The cascade condition: Flash confidence below threshold or query tagged as complex → escalate to Pro.
KCC routing: T2 methodology — cheap LLM call identifies query type; expensive LLM call handles only the identified-hard subset. Two-stage cascade conditioned on the cheap classifier’s output.
Embedding search → lexical search — dense retrieval for semantic queries; BM25 fallback for exact-match or high-precision queries where dense underperforms. The cascade condition is retrieval confidence.
Rate-limited APIs — cheap API tier handles most requests; premium tier handles overflow or high-priority traffic. The cascade is on rate-limit signal rather than quality signal.
Test suite tiering — fast unit tests run on every commit; slow integration tests run on escalation condition (e.g., changes to core modules).

When it applies / triggers on

User-initiated: User is designing a multi-tier system or expressing a cost/budget concern: “this is too slow/expensive to run every time,” “we want a fast path for the common case,” “can we avoid the expensive call unless necessary?”

Agent-initiated: Engine detects a decision context where a single expensive operation is the proposed solution to a problem that could be solved by a cheap operation in most cases. Candidate inference: “this is a cost-cascade candidate — can we put a cheap filter in front of the expensive operation and escalate only when the cheap path falls short?”

Vocabulary cues: “fast path,” “fallback,” “escalate,” “try X first,” “if cheap fails,” “tiered,” “cascade,” “common case,” “edge case,” “rate limit,” “budget,” “expensive only when necessary,” “two-stage.”

Situation-shape signals: A proposed operation with high per-call cost that handles a mix of easy and hard cases. The easy cases could be handled cheaply; the hard cases require the expensive call. The form is indicated when the proportion of easy cases is meaningfully large (otherwise the cascade overhead exceeds the savings).

Composes with

rivals-into-router (specialization relationship) — cost-cascade is a specialization of rivals-into-router where the routing condition is defensive (cheap path failed/insufficient) rather than proactive (signal predicts which branch to use).
asymmetric-gate (composition relationship) — the escalation condition is an asymmetric gate: below threshold, stay on cheap path; above threshold, pay the expensive path’s cost.
gradient (composition relationship) — the quality-vs-cost tradeoff is a gradient; the cascade’s threshold is a choice of where to draw the line on that gradient.
stack-layer (composition relationship) — the cascade forms a stack: cheap layer below, expensive layer above. The cheap layer handles what it can; the expensive layer handles what the cheap layer can’t.
multi-channel-ingest (composition relationship) — when multiple input channels feed the same store, cost-cascade determines which channel’s data gets the expensive processing treatment (high-volume cheap channel vs. low-volume expensive channel).
uniformity-dividend (composition relationship) — a cost-cascade that applies uniformly across all query types earns less dividend than one calibrated to the actual proportion of easy vs. hard cases.

When it doesn’t apply

When the cheap path’s overhead exceeds its savings — if the cheap path is expensive to run relative to what it saves (e.g., the classifier is nearly as slow as the expensive call), the cascade doesn’t pay off. This is a calibration check, not a failure of the form.
When there’s no meaningful cost differential — if the cheap and expensive paths have similar latency and cost, cost-cascade adds complexity without benefit.
When the common case requires the expensive path — if most cases are “hard” (the cheap path rarely handles them), the cascade reduces to “always use the expensive path with an extra round-trip overhead.”
When the quality differential matters more than the cost — sometimes you want the expensive path’s quality even on easy cases (e.g., for consistency or auditability). Cost-cascade trades quality uniformity for cost savings.

Sources

MAC/FAC model (Gentner et al.) — the two-stage retrieval architecture is the canonical cog-sci instance.
Ensemble methods in ML: cheap weak classifier + expensive strong classifier, with the weak classifier as a filter.
API tiering patterns in cloud services: free tier → paid tier escalation on quota.
Named “cost-cascade” in analogy-project design work; the pattern was identified as distinct from rivals-into-router during the conversation that coined it.

Canonical exemplars from corpus (T2 2026-05-17)

Caveat — very low corpus support. Only 3 backfill-only matches at score ≥ 2; the name cost-cascade was coined late (2026-05-17 analogy-project design work) and hasn’t yet propagated into the corpus prose at scale. The form’s shape — cheap-first then expensive-fallback — is widespread in the corpus (LLM-extractor pipelines, ranking systems, source-routing) but expressed via domain vocabulary rather than the bundle’s name. Exemplars below illustrate the shape; richer instances will accumulate as the name is used.

Shell A (Gemini Flash) already doing the cheap extraction step (cwd: campconnect, 2026-04-25): “The existing pipeline/src/extractors/generic-llm/ (Shell A) already calls Gemini Flash via the pipeline/src/llm/ wrapper — so ‘outsource extraction to Gemini’ is what we’re already doing for the actual data work. EA is a one-shot Q&A pattern… poorly suited for high-throughput crawling.” — cost-cascade shape in production: Flash for the common-path extraction; EA / Pro reserved for harder cases.
Defensive/offensive symmetry — content density as bidirectional ranking signal (cwd: campconnect, 2026-05-03): “This is a textbook case of defensive/offensive symmetry in a ranking system. Content density was added as an offensive lever (boost weak-but-content-rich candidates) but never wired up as a defensive lever (penalize keyword-rich-but-content-empty candidates). The fix is making the same signal work in both directions.” — cost-cascade-adjacent: a ranking pipeline where cheap signals filter what reaches the expensive ranker; the insight is that the cheap signal needs to fire both directions.
Two-PR strategy was wrong — cherry-pick combined (cwd: campconnect, 2026-05-10, task-notification): “Two-PR strategy was wrong for this work… The design doc and the code that fulfills (part of) it are coupled enough that splitting them creates merge-order tension.” — not a literal cost-cascade but the same family-of-thinking: when separating stages creates downstream cost, combine them.

Trigger pattern (T2): Cost-cascade surfaces when the user proposes a multi-tier system or expresses a cost/budget concern, OR when the agent notices a single expensive operation handling work a cheap filter could handle most of. Caveat: with n=3 and a recently-coined name, the trigger pattern is mostly anticipated from the catalog’s MAC/FAC + Gemini Flash → Pro + ensemble ML lineage rather than empirically validated in corpus prose. Expect this exemplar set to thicken as the name propagates.

Forms catalog

Explorer

cost-cascade