Confidence, Gaps & Iteration Limits

The agent is built to be honest about uncertainty and to stop when more iteration won't help. This page covers the three mechanisms that keep it that way: per-claim confidence scores, gap classification, and the hard 3-loop cap on refinement.

Confidence scores

Every claim the Researcher produces gets a 0–100 confidence score, attached to the node. The score reflects:

  • Source strength. How authoritative are the documents the claim was drawn from?
  • Consensus. Do multiple sources agree, or is this one paper's view?
  • Specificity. Is the claim narrowly tied to the cited passages, or is it a broader generalization the passages only partially support?

Reading the scores

Rough heuristic for interpreting confidence:

  • 85–100 — well-supported, multiple sources in agreement, claim narrowly matches what's cited.
  • 65–84 — solid, but worth a glance. Often a single strong source or two weaker corroborating ones.
  • 40–64 — the agent had to reach. Inspect the citations and decide whether the claim is defensible as written.
  • Below 40 — flagged for your review. Often these are the most interesting claims (contested ground), but they shouldn't go into your writing without a manual pass.

What to do with low-confidence claims

Don't reflexively delete them. Three productive responses:

  • Soften. Rewrite the claim to match what the evidence actually shows. "X is the dominant view" → "X is one of several competing views."
  • Add sources. If the claim is right but undersupported, find more evidence in your KB (or add it to the KB).
  • Counter-claim. If the low confidence reflects genuine controversy, create a sibling node for the opposing view. The contradiction is the right representation.

Confidence scores are visible on each node and surfaced in the workflow summary so you can find them without hunting.

Gap classification

When the Researcher comes up short — either too few results, or results that all repeat the same thin material — it doesn't just give up. The RePlanner classifies the shortfall into one of two categories:

Retrievable gaps

The information probably is in the KB, but the queries didn't surface it. Maybe the terminology was off, maybe the question needed to be sliced differently.

The RePlanner's response: generate a revised, more targeted set of queries and send the Researcher back in for another loop. You'll see this happen as the workflow continues without intervention.

KB limitations

The information isn't in your KB. No amount of query refinement is going to find it.

The RePlanner's response: stop searching. Include the gap in the user-facing summary with a note about what's missing. Typical suggestions:

  • Add specific papers to the KB.
  • Relax the question.
  • Accept the gap and proceed with what's available.

Distinguishing these two cases automatically is one of the more important things the system does — it stops you from burning credits looking for things that aren't there.

The 3-loop cap

The RePlanner can decide to refine the search at most three times per instruction. The loop structure:

  • Initial. FrontDesk hands off to the Researcher for broad discovery.
  • Loop 1. RePlanner narrows the scope to specific identified gaps.
  • Loop 2. Further narrowing — focusing on what loop 1 still missed.
  • Loop 3. Final targeted retrieval for edge cases.

After the third iteration, the system hardcodes a stop: the Editor applies whatever's best, the RePlanner writes a final summary, and the workflow ends. This is non-negotiable — there's no "let it keep trying" override.

Why three?

In practice, retrieval-augmented systems hit sharply diminishing returns past loop 2. The 3-loop cap exists because:

  • Most genuine gaps are either resolved by loop 2 or are KB limitations.
  • Past loop 3, the system would mostly be paraphrasing earlier results back at itself.
  • Unbounded loops would let a single instruction burn arbitrary credits.

What happens at the cap

When the cap is reached, the workflow doesn't fail — it terminates cleanly:

  • Best available results are applied to the mindmap.
  • The summary explains explicitly that the cap was reached.
  • Specific limitations or unresolved gaps are listed with follow-up suggestions.

If you want to push further on the same topic, the right move is another instruction with a narrower scope, not a way to extend the original loop.

Reading the summary

Every workflow ends with a textual summary that includes:

  • What changed on the mindmap.
  • Completion score — the RePlanner's 0–100 assessment of how well the instruction was addressed.
  • Limitations encountered — KB gaps, low-confidence areas, anything the agent flagged.
  • Follow-up suggestions — usually 1–3 specific next instructions.

The summary is short by design (under ~300 words). Read it. The follow-up suggestions in particular often save a turn — they're written with the workflow's full context, which yours might be missing.

Token and budget enforcement

Independent of the 3-loop cap, there are configurable per-operation and total token limits. If a workflow would exceed your budget, it's stopped early and the partial result is reported transparently. See Credits & Billing for the credit-side view.

What's next