Spend less,
reason better.
AI agents waste most of their compute chasing dead ends and repeating themselves. BAVT was built to address this: plan first, grade every tool call, focus harder as the budget runs low.
Wasteful vs. smart
Standard agents run tool calls with no strategy, burning budget on dead ends and repeated steps. BAVT builds a reasoning tree, scores every individual step as it happens, and prunes bad paths before they drain the budget.
LLMs are well-documented to be overconfident when asked to rate the quality of their own reasoning. They tend to say "things look great!" even when stuck. BAVT fixes this by asking a different question: not "how good does this look overall?" but "how much did we actually just gain from that last step?" Scoring the change (the delta) is much harder for the model to inflate, making the critic genuinely reliable.
Most AI systems only check whether the final answer was good. BAVT checks after every single step. And crucially, it asks "how much did we just gain?" rather than "how good do things look overall?" This matters because LLMs consistently over-rate their own absolute quality. Scoring the marginal gain (the delta) is much harder to inflate.
BAVT is designed for any tool-augmented agent, not just web search. Each tool call (database query, API request, document retrieval, code execution) costs resources. BAVT manages that budget intelligently regardless of what the tools actually do.
Explore when rich,
commit when scarce.
As the budget depletes, BAVT doesn't just gradually prefer the best path. It becomes exponentially more decisive. Move the slider to see how the math shifts from open exploration to winner-takes-all commitment.
This is the mechanism behind the slider above. BAVT uses a power function, where the exponent is inversely proportional to the remaining budget ratio, to decide which branch to explore next. With plenty of budget, the exponent is close to 1, so all promising paths get a fair shot. As budget runs low, the exponent grows large, which exponentially amplifies the difference between the best-scoring branch and the rest, until it becomes essentially winner-takes-all. No manual tuning. No human deciding when to switch. The math handles it automatically as a natural consequence of the budget running out.
A quarter of the cost.
Better answers.
Tested across four multi-hop question benchmarks using two different AI models. BAVT at 5 tool calls consistently beat the standard approach at 20 tool calls. The researchers proved mathematically that it will always reach an answer.
The researchers didn't just observe that BAVT performs well empirically. They proved it mathematically. Given a sufficient budget, BAVT is guaranteed to reach a final answer with probability at least 1 − ε (where ε can be made arbitrarily small). Most AI systems just show you benchmark numbers and hope. BAVT comes with a mathematical guarantee.
Six principles for
frugal AI development.
What BAVT teaches about resource-constrained reasoning, for AI systems and the humans building them.