Architecture
How the system is shaped: agent topology, planning, step-level evaluation, and the structural choices that determine whether compute is spent well.
No practices in this stage match the current filters.
-
WhenYou’re choosing an architecture for a multi-hop reasoning task and the input context is well-curated and not adversarial.
UseA single agent with a generous thinking-token budget as the default. Reach for multi-agent orchestration only when the input is noisy, adversarial, or too long for the model to use effectively. The crossover where multi-agent earns its overhead happens at heavy context degradation, not at moderate task complexity.
EvidenceAcross three model families (Qwen3, DeepSeek, Gemini 2.5), five multi-agent architectures, and two multi-hop reasoning benchmarks (FRAMES and MuSiQue 4-hop), single-agent systems matched or beat every multi-agent variant under matched thinking-token budgets. Multi-agent variants only pulled ahead under heavy context corruption, such as 70% token substitution.
-
WhenYou’re building an agent that runs multiple tool calls (search, retrieval, code execution, API requests) per task and want to avoid wasting budget on dead-end paths.
UseA critic that grades every step as it happens, not just the final answer. Score the marginal gain from each step rather than the absolute quality of the trajectory. Catching a wrong turn after one tool call is far cheaper than discovering it after ten.
EvidenceBudget-Aware Value Trees outperformed standard tool-augmented agents on four multi-hop QA benchmarks across two model families. The technique at 5 tool calls reached 33.8% accuracy, beating standard agents at 20 tool calls (33.4%). Step-level scoring of marginal gain was more reliable than absolute self-assessment, which LLMs are known to inflate.
-
WhenYou’re starting any task that requires multiple tool calls, retrieval steps, or LLM iterations, whether that’s an autonomous agent or a vibe-coding session.
UseA short up-front planning pass that sketches the logical shape of the problem before spending any compute. The plan does not need facts. It just needs the abstract steps, the expected number of calls, and the dependencies. Then execute against that plan and re-plan only when reality contradicts it.
EvidenceTwo independent studies converged on this. Budget-Aware Value Trees use an explicit plan node before any tool call and outperform unplanned standard agents at 4x the compute. Separately, the vibe-coding qualitative study found “plan before you vibe” was one of the two most universal community-derived best practices, addressing the highest-severity failure modes around runaway code changes and structural breakdown.