Skip to content

How ggui works

read as .md

A walk-through for agent developers. You’ll come out of this with a working mental model of what happens between the moment your agent calls ggui_handshake and the moment the user submits the form.

Five minutes. No setup required — this is conceptual.

Every ggui exchange is the same four moments, in order:

1. HANDSHAKE Post a draft contract; the server routes a suggestion
2. RENDER Accept or override; the server mints an MCP-Apps resource
3. INTERACT The host mounts it; the user fills the UI and submits
4. CONSUME Drain the user's gestures off a render-scoped pipe

The rest of this page expands those four moments into a story.

1. Handshake — the wire surface is negotiated

Section titled “1. Handshake — the wire surface is negotiated”

Your agent’s first call is ggui_handshake — the server runs blueprint-search + contract-validation in parallel and returns a routed suggestion. (These are MCP tool calls the LLM emits; there is no client SDK — the shapes below are the tool input → output.)

// ggui_handshake tool — input:
ggui_handshake({
intent: "collect feedback after a support chat",
blueprintDraft: {
contract: {
/* propsSpec, actionSpec, ... */
},
},
});
// → { handshakeId, action, suggestion }

The returned suggestion.origin is cache (existing blueprint matched), agent (gen against the draft), or synth (gen against an amended draft). No UI is generated yet — the agent commits next, on render.

Each render is independent: each handshake → render pair mints a fresh GguiSession — the protocol’s unit for one rendered UI — keyed by sessionId. There is no conversation-level session object; conversation-scoped grouping (sibling renders inside the same chat) flows through the _meta["ai.ggui/host-session"] slice — captured ONCE at creation.

→ See ggui_handshake for the wire shape.

2. Render — the UI gets generated (or matched)

Section titled “2. Render — the UI gets generated (or matched)”

Now the agent commits against the prior handshake’s suggestion — props is required; omit override to accept the suggestion as-is:

// ggui_render tool — props required; omitting `override` accepts
// the handshake suggestion:
ggui_render({
handshakeId,
props: { question: "How did the session go?" },
// or re-aim: override: { contract: {...} } / { variance: {...} }
});
// → { sessionId, resourceUri, action, ... }

Server-side, materialisation runs one of two paths — the path was already chosen at handshake time, render just executes it:

  1. Cache delivery (suggestion.origin === 'cache'). A matching blueprint was found during handshake; render serves the cached component. ~100ms.
  2. Fresh generation (origin === 'agent' or 'synth'). The server runs the LLM-driven UI generator (@ggui-ai/ui-gen) — plan → impl → check → derive. The output is a TSX component compiled to JS, plus a typed contract describing the actions the user can take and the data they can submit. ~3s.

Either way, any gadgets the component imports (Leaflet, Stripe, Calendar, …) resolve from the app’s declared gadget set (stdlib floor + ggui.json#app.gadgets) and load SRI-verified at iframe boot.

The agent gets back a sessionId (globally unique UUID for the delivered render) and a resourceUri (ui://ggui/render/<id>). The render is an MCP-Apps resource — there is no clickable URL the agent forwards; a host mounts the resource.

→ See ggui_render for the wire shape.

A host mounts the render — your app via <AppRenderer>, or an MCP-Apps host like claude.ai inline. The renderer:

  1. Hits the bootstrap channel — fetches the compiled component bundle (SRI-verified)
  2. Mounts the component in an iframe with the props the agent rendered
  3. Connects the live channel — a WebSocket subscription scoped to this render
  4. When the user submits, the component dispatches an ActionEnvelope like { type: "data:submit", payload: {...} }. The server validates the payload against the contract’s actionSpec.

The renderer is stateless between page loads — props come from the server, state comes from the user, and the server is the source of truth for the render’s state.

→ See Envelopes for the live-channel wire reference.

4. Consume — the action lands back with the agent

Section titled “4. Consume — the action lands back with the agent”

Actions are agent-routed. The server queues every gesture on a render-scoped pipe; the agent drains it by calling ggui_consume (long-poll, keyed by sessionId):

// ggui_consume tool (long-poll) — returns { events, status }:
const { events, status } = ggui_consume({ sessionId, timeout: 25 });
for (const event of events) {
if (event.intent === "submit_feedback") {
await processFeedback(event.actionData);
}
}

Each row is a ConsumeEventEntry: { type: 'action', sessionId, intent, actionData, uiContext, actionId, firedAt }. intent is the action key from the contract’s actionSpec; actionData is the typed payload (validated against actionSpec[intent].schema). status is 'active' until the render’s TTL elapses ('expired') — exit the loop once you have the events you need, or when status is 'expired'.

An actionSpec entry may carry a nextStep: '<toolName>' hint naming one of the contract’s agentCapabilities.tools — an advisory hint for the agent’s planner. Implementations MUST treat it as advisory; the agent owns the call decision. Agent-less ggui serve deployments take the same path: events queue on the consume buffer until an agent attaches and drains them — the server never invokes a tool on the user’s behalf. There is no second routing model.

When the agent wants to refresh the visible card in response to an event (e.g. show a confirmation, splice in new data), it calls ggui_update (keyed by sessionId, kind: 'replace' | 'merge') — the iframe receives the new props on the live channel without a fresh ggui_render. Then loop back to ggui_consume. Rule of thumb: if your reaction ran a domain tool that changed what the card displays, call ggui_update before re-calling ggui_consume — skipping it is the most common wire-compliance bug.

→ See ggui_consume and the ConsumeEventEntry row shape on the same page.

Notice what your agent code did not have to handle:

  • No UI authoring. The component code was generated or matched from cache.
  • No WebSocket plumbing. The renderer connects to the live channel on its own; you didn’t open a socket.
  • No state management. The server holds render state. You called ggui_consume and got events.
  • No SDK lock-in. Everything above is plain MCP tool calls — works from any MCP client.

That’s the protocol. The OSS ggui serve running locally (ws://127.0.0.1:6781/ws) is the reference implementation; a hosted endpoint at mcp.ggui.ai (wss://mcp.ggui.ai/ws) is coming soon — both speak the same wire.