Cost-cascade

Description

A conditional cascade where a cheap-default path handles the common case and an expensive-fallback path handles the remainder — with the switch between them conditioned on the outcome of the cheap path (failure, low confidence, threshold not met). The cascade’s value comes from the cost differential: if the expensive path were always used, it would either be rate-limited, budget-exhausted, or simply too slow; the cheap path acts as a filter that directs only the hard cases to the expensive path.

Cost-cascade differs from rivals-into-router in that the routing condition is defensive rather than proactive: the cheap path runs first and the expensive path is invoked because the cheap path fell short, not because a routing signal predicted the expensive path was appropriate. The cascade is sequential (cheap, then conditionally expensive); rivals-into-router is parallel dispatch based on predicted fit.

The form also differs from plain asymmetric-gate: asymmetric-gate is a boundary with differential cost in each direction; cost-cascade is a pipeline where stages have increasing cost and each stage gates entry to the next.

Composition

= gradient (the cost/quality tradeoff dimension) + asymmetric-gate (the threshold that determines whether to escalate) + stack-layer (the cheap layer sitting below the expensive layer).

The gradient identifies the quality-vs-cost dimension along which the cascade operates. The asymmetric-gate sets the escalation condition (cheap path below threshold → escalate). Stack-layer captures the layered structure: cheap layer handles the common case; expensive layer is invoked only when the cheap layer’s output is insufficient.

Encounters

  • MAC→FAC in the engine — MAC (embedding retrieval, cheap) → FAC structural alignment (subagent dispatch, expensive). MAC retrieves top-k candidates; FAC aligns only those. The cascade is: all queries → MAC; MAC top-k → FAC. This is cost-cascade rather than rivals-into-router because the escalation condition is “MAC returned candidates” (not a predicted-fit signal).
  • Gemini Flash → Pro — Flash handles common-path queries (fast, cheap); Pro handles escalated queries (slower, expensive). The cascade condition: Flash confidence below threshold or query tagged as complex → escalate to Pro.
  • KCC routing: T2 methodology — cheap LLM call identifies query type; expensive LLM call handles only the identified-hard subset. Two-stage cascade conditioned on the cheap classifier’s output.
  • Embedding search → lexical search — dense retrieval for semantic queries; BM25 fallback for exact-match or high-precision queries where dense underperforms. The cascade condition is retrieval confidence.
  • Rate-limited APIs — cheap API tier handles most requests; premium tier handles overflow or high-priority traffic. The cascade is on rate-limit signal rather than quality signal.
  • Test suite tiering — fast unit tests run on every commit; slow integration tests run on escalation condition (e.g., changes to core modules).

When it applies / triggers on

User-initiated: User is designing a multi-tier system or expressing a cost/budget concern: “this is too slow/expensive to run every time,” “we want a fast path for the common case,” “can we avoid the expensive call unless necessary?”

Agent-initiated: Engine detects a decision context where a single expensive operation is the proposed solution to a problem that could be solved by a cheap operation in most cases. Candidate inference: “this is a cost-cascade candidate — can we put a cheap filter in front of the expensive operation and escalate only when the cheap path falls short?”

Vocabulary cues: “fast path,” “fallback,” “escalate,” “try X first,” “if cheap fails,” “tiered,” “cascade,” “common case,” “edge case,” “rate limit,” “budget,” “expensive only when necessary,” “two-stage.”

Situation-shape signals: A proposed operation with high per-call cost that handles a mix of easy and hard cases. The easy cases could be handled cheaply; the hard cases require the expensive call. The form is indicated when the proportion of easy cases is meaningfully large (otherwise the cascade overhead exceeds the savings).

Composes with

  • rivals-into-router (specialization relationship) — cost-cascade is a specialization of rivals-into-router where the routing condition is defensive (cheap path failed/insufficient) rather than proactive (signal predicts which branch to use).
  • asymmetric-gate (composition relationship) — the escalation condition is an asymmetric gate: below threshold, stay on cheap path; above threshold, pay the expensive path’s cost.
  • gradient (composition relationship) — the quality-vs-cost tradeoff is a gradient; the cascade’s threshold is a choice of where to draw the line on that gradient.
  • stack-layer (composition relationship) — the cascade forms a stack: cheap layer below, expensive layer above. The cheap layer handles what it can; the expensive layer handles what the cheap layer can’t.
  • multi-channel-ingest (composition relationship) — when multiple input channels feed the same store, cost-cascade determines which channel’s data gets the expensive processing treatment (high-volume cheap channel vs. low-volume expensive channel).
  • uniformity-dividend (composition relationship) — a cost-cascade that applies uniformly across all query types earns less dividend than one calibrated to the actual proportion of easy vs. hard cases.

When it doesn’t apply

  • When the cheap path’s overhead exceeds its savings — if the cheap path is expensive to run relative to what it saves (e.g., the classifier is nearly as slow as the expensive call), the cascade doesn’t pay off. This is a calibration check, not a failure of the form.
  • When there’s no meaningful cost differential — if the cheap and expensive paths have similar latency and cost, cost-cascade adds complexity without benefit.
  • When the common case requires the expensive path — if most cases are “hard” (the cheap path rarely handles them), the cascade reduces to “always use the expensive path with an extra round-trip overhead.”
  • When the quality differential matters more than the cost — sometimes you want the expensive path’s quality even on easy cases (e.g., for consistency or auditability). Cost-cascade trades quality uniformity for cost savings.

Sources

  • MAC/FAC model (Gentner et al.) — the two-stage retrieval architecture is the canonical cog-sci instance.
  • Ensemble methods in ML: cheap weak classifier + expensive strong classifier, with the weak classifier as a filter.
  • API tiering patterns in cloud services: free tier → paid tier escalation on quota.
  • Named “cost-cascade” in analogy-project design work; the pattern was identified as distinct from rivals-into-router during the conversation that coined it.

Canonical exemplars from corpus (T2 2026-05-17)

Caveat — very low corpus support. Only 3 backfill-only matches at score ≥ 2; the name cost-cascade was coined late (2026-05-17 analogy-project design work) and hasn’t yet propagated into the corpus prose at scale. The form’s shape — cheap-first then expensive-fallback — is widespread in the corpus (LLM-extractor pipelines, ranking systems, source-routing) but expressed via domain vocabulary rather than the bundle’s name. Exemplars below illustrate the shape; richer instances will accumulate as the name is used.

  • Shell A (Gemini Flash) already doing the cheap extraction step (cwd: campconnect, 2026-04-25): “The existing pipeline/src/extractors/generic-llm/ (Shell A) already calls Gemini Flash via the pipeline/src/llm/ wrapper — so ‘outsource extraction to Gemini’ is what we’re already doing for the actual data work. EA is a one-shot Q&A pattern… poorly suited for high-throughput crawling.” — cost-cascade shape in production: Flash for the common-path extraction; EA / Pro reserved for harder cases.
  • Defensive/offensive symmetry — content density as bidirectional ranking signal (cwd: campconnect, 2026-05-03): “This is a textbook case of defensive/offensive symmetry in a ranking system. Content density was added as an offensive lever (boost weak-but-content-rich candidates) but never wired up as a defensive lever (penalize keyword-rich-but-content-empty candidates). The fix is making the same signal work in both directions.” — cost-cascade-adjacent: a ranking pipeline where cheap signals filter what reaches the expensive ranker; the insight is that the cheap signal needs to fire both directions.
  • Two-PR strategy was wrong — cherry-pick combined (cwd: campconnect, 2026-05-10, task-notification): “Two-PR strategy was wrong for this work… The design doc and the code that fulfills (part of) it are coupled enough that splitting them creates merge-order tension.” — not a literal cost-cascade but the same family-of-thinking: when separating stages creates downstream cost, combine them.

Trigger pattern (T2): Cost-cascade surfaces when the user proposes a multi-tier system or expresses a cost/budget concern, OR when the agent notices a single expensive operation handling work a cheap filter could handle most of. Caveat: with n=3 and a recently-coined name, the trigger pattern is mostly anticipated from the catalog’s MAC/FAC + Gemini Flash → Pro + ensemble ML lineage rather than empirically validated in corpus prose. Expect this exemplar set to thicken as the name propagates.