What Is Sunk Project Cost?
Unrecoverable spend already incurred on an AI or ML project — including labelled data, training compute, vendor fees, and engineering time — that should not drive forward-looking go/no-go decisions.
What Is Sunk Project Cost?
Sunk project cost in an AI/ML context is the spend already committed to a project that cannot be recovered by continuing — labelled data, GPU training cycles, custom evaluator development, fine-tune compute, vendor licence fees, and the engineering hours that built bespoke infrastructure. The economic principle is simple: rational go/no-go decisions should weigh only forward cost against forward expected value. The reality is that AI projects produce unusually tangible-feeling artefacts — a trained checkpoint, a labelled dataset, a curated rubric — and humans treat tangible artefacts as assets even when their forward value is negative.
Why It Matters in Production LLM and Agent Systems
The sunk-cost trap shows up across the AI delivery cycle. A team spends six months and $400K labelling a custom dataset; the resulting fine-tune underperforms an off-the-shelf foundation model on the same task by 4 percentage points. The economically rational move is to switch to the foundation model. The sunk-cost-driven move is to keep refining the dataset because “we’ve already invested too much to walk away”.
The pain shows up across roles. A platform engineer maintains a homegrown evaluation framework two engineers wrote in 2024; the framework now lags every open-source alternative on metric coverage, but rebuilding feels like throwing away the work. A product manager keeps a fine-tuned legacy model live alongside a new foundation model “until it pays back” — never running the math that the inference savings would amortise the legacy model’s maintenance cost in eight months. A founder commits to a fine-tuning roadmap because the team labelled 80K examples already.
In 2026, with foundation-model capability shifting every 6-8 weeks, the sunk-cost trap is more expensive than ever. A reasoning-model release can obsolete six months of prompt engineering. A new vision-language model can replace a hand-tuned image-captioning fine-tune. The teams that move fastest treat past spend as information (“this dataset taught us what edge cases matter”) rather than as an entitlement to continue.
How FutureAGI Helps Avoid the Sunk-Cost Trap
FutureAGI does not refund GPU compute, but it produces the forward-looking signals that turn the go/no-go decision into a data question rather than a political one. The platform surfaces three categories of signal:
Forward quality signals: AnswerRelevancy, Faithfulness, TaskCompletion, TrajectoryScore, and per-cohort eval scores from Dataset.add_evaluation show how the current candidate stack performs against the current test set. A foundation model’s score against the same Dataset v18 the fine-tune was tested on is a direct apples-to-apples comparison.
Forward cost signals: traceAI integrations capture llm.token_count.prompt and llm.token_count.completion per call, the Agent Command Center records inference spend per route, and dashboards roll up cost-per-trace so the next month’s forward cost is projectable. A cost-optimized-routing policy can A/B test a new model variant in production without committing to a full migration.
Concretely: a team weighing whether to keep a custom 7B fine-tune vs. switch to a hosted 70B reasoning model loads the same eval set into FutureAGI, runs Faithfulness, TrajectoryScore, and per-call cost. The 70B is 18% better on quality and 2.4× more expensive per call. With trace-level cost attribution, they discover that 35% of traffic does not need the 70B’s reasoning depth — those traces route to the cheaper fine-tune via a routing policy: cost-optimized rule. The sunk-cost question dissolves into a routing decision.
How to Measure or Detect It
- Forward expected eval delta: difference in
AnswerRelevancyorTaskCompletionbetween the current and candidate stack on the sameDataset. - Per-trace inference cost: sum of
llm.token_count.*× per-token price; gives forward unit economics. - Migration break-even (dashboard signal): months until the candidate stack’s lower forward cost recoups the migration engineering cost.
- Eval drift: rate at which the current stack’s quality metrics degrade over time on refreshed traffic; a degrading stack has lower forward value than a static one.
- Engineering-hours-to-maintain: tracked outside FutureAGI but a critical input to the forward-cost calculation.
from fi.evals import AnswerRelevancy, TaskCompletion
relevancy = AnswerRelevancy()
completion = TaskCompletion()
result_a = relevancy.evaluate(input="...", output="...")
result_b = completion.evaluate(trajectory=run.steps, goal=run.goal)
print(result_a.score, result_b.score)
Common Mistakes
- Quoting past spend in the go/no-go meeting. “We’ve spent $1M on this” is irrelevant to whether the next $100K is well spent.
- Treating labelled data as an asset, not information. A dataset’s value is whether it teaches you about your task — labelled examples can be re-derived if the task is right.
- Avoiding A/B tests because they “waste” the legacy stack. Running both stacks for two weeks costs less than one wrong commitment.
- Ignoring the maintenance tax. Custom infrastructure has a long forward cost that is invisible until you sum the engineering hours.
- Confusing momentum with value. A project that is hard to stop is not the same as a project that is worth continuing.
Frequently Asked Questions
What is sunk project cost in an AI project?
It is the unrecoverable spend already incurred on a model, dataset, or platform — labelled data hours, GPU compute, vendor licences, engineering time — that cannot be retrieved by continuing the project.
How is sunk cost different from forward cost?
Sunk cost is what you have already spent. Forward cost is what you will spend from this point onward. Rational go/no-go decisions weigh forward cost against forward expected value; sunk cost is irrelevant to the decision.
How can FutureAGI help avoid the sunk-cost trap in AI projects?
Run forward-looking eval signals — AnswerRelevancy, Faithfulness, TaskCompletion, plus inference cost per trace — through Dataset.add_evaluation. Decisions to continue, pivot, or kill a workstream get anchored to expected forward value rather than past spend.