How ggui works
read as.md A walk-through for agent developers. You’ll come out of this with a working mental model of what happens between the moment your agent calls ggui_handshake and the moment the user submits the form.
Five minutes. No setup required — this is conceptual.
The four moments
Section titled “The four moments”Every ggui exchange is the same four moments, in order:
1. HANDSHAKE Post a draft contract; the server routes a suggestion2. RENDER Accept or override; the server mints an MCP-Apps resource3. INTERACT The host mounts it; the user fills the UI and submits4. CONSUME Drain the user's gestures off a render-scoped pipeThe rest of this page expands those four moments into a story.
1. Handshake — the wire surface is negotiated
Section titled “1. Handshake — the wire surface is negotiated”Your agent’s first call is ggui_handshake — the server runs blueprint-search + contract-validation in parallel and returns a routed suggestion. (These are MCP tool calls the LLM emits; there is no client SDK — the shapes below are the tool input → output.)
// ggui_handshake tool — input:ggui_handshake({ intent: "collect feedback after a support chat", blueprintDraft: { contract: { /* propsSpec, actionSpec, ... */ }, },});// → { handshakeId, action, suggestion }The returned suggestion.origin is cache (existing blueprint matched), agent (gen against the draft), or synth (gen against an amended draft). No UI is generated yet — the agent commits next, on render.
Each render is independent: each handshake → render pair mints a fresh GguiSession — the protocol’s unit for one rendered UI — keyed by sessionId. There is no conversation-level session object; conversation-scoped grouping (sibling renders inside the same chat) flows through the _meta["ai.ggui/host-session"] slice — captured ONCE at creation.
→ See ggui_handshake for the wire shape.
2. Render — the UI gets generated (or matched)
Section titled “2. Render — the UI gets generated (or matched)”Now the agent commits against the prior handshake’s suggestion — props is required; omit override to accept the suggestion as-is:
// ggui_render tool — props required; omitting `override` accepts// the handshake suggestion:ggui_render({ handshakeId, props: { question: "How did the session go?" }, // or re-aim: override: { contract: {...} } / { variance: {...} }});// → { sessionId, resourceUri, action, ... }Server-side, materialisation runs one of two paths — the path was already chosen at handshake time, render just executes it:
- Cache delivery (
suggestion.origin === 'cache'). A matching blueprint was found during handshake; render serves the cached component. ~100ms. - Fresh generation (
origin === 'agent'or'synth'). The server runs the LLM-driven UI generator (@ggui-ai/ui-gen) — plan → impl → check → derive. The output is a TSX component compiled to JS, plus a typed contract describing the actions the user can take and the data they can submit. ~3s.
Either way, any gadgets the component imports (Leaflet, Stripe, Calendar, …) resolve from the app’s declared gadget set (stdlib floor + ggui.json#app.gadgets) and load SRI-verified at iframe boot.
The agent gets back a sessionId (globally unique UUID for the delivered render) and a resourceUri (ui://ggui/render/<id>). The render is an MCP-Apps resource — there is no clickable URL the agent forwards; a host mounts the resource.
→ See ggui_render for the wire shape.
3. Interact — the user fills the UI
Section titled “3. Interact — the user fills the UI”A host mounts the render — your app via <AppRenderer>, or an MCP-Apps host like claude.ai inline. The renderer:
- Hits the bootstrap channel — fetches the compiled component bundle (SRI-verified)
- Mounts the component in an iframe with the props the agent rendered
- Connects the live channel — a WebSocket subscription scoped to this render
- When the user submits, the component dispatches an
ActionEnvelopelike{ type: "data:submit", payload: {...} }. The server validates the payload against the contract’sactionSpec.
The renderer is stateless between page loads — props come from the server, state comes from the user, and the server is the source of truth for the render’s state.
→ See Envelopes for the live-channel wire reference.
4. Consume — the action lands back with the agent
Section titled “4. Consume — the action lands back with the agent”Actions are agent-routed. The server queues every gesture on a render-scoped pipe; the agent drains it by calling ggui_consume (long-poll, keyed by sessionId):
// ggui_consume tool (long-poll) — returns { events, status }:const { events, status } = ggui_consume({ sessionId, timeout: 25 });for (const event of events) { if (event.intent === "submit_feedback") { await processFeedback(event.actionData); }}Each row is a ConsumeEventEntry: { type: 'action', sessionId, intent, actionData, uiContext, actionId, firedAt }. intent is the action key from the contract’s actionSpec; actionData is the typed payload (validated against actionSpec[intent].schema). status is 'active' until the render’s TTL elapses ('expired') — exit the loop once you have the events you need, or when status is 'expired'.
An actionSpec entry may carry a nextStep: '<toolName>' hint naming one of the contract’s agentCapabilities.tools — an advisory hint for the agent’s planner. Implementations MUST treat it as advisory; the agent owns the call decision. Agent-less ggui serve deployments take the same path: events queue on the consume buffer until an agent attaches and drains them — the server never invokes a tool on the user’s behalf. There is no second routing model.
When the agent wants to refresh the visible card in response to an event (e.g. show a confirmation, splice in new data), it calls ggui_update (keyed by sessionId, kind: 'replace' | 'merge') — the iframe receives the new props on the live channel without a fresh ggui_render. Then loop back to ggui_consume. Rule of thumb: if your reaction ran a domain tool that changed what the card displays, call ggui_update before re-calling ggui_consume — skipping it is the most common wire-compliance bug.
→ See ggui_consume and the ConsumeEventEntry row shape on the same page.
What you didn’t have to do
Section titled “What you didn’t have to do”Notice what your agent code did not have to handle:
- No UI authoring. The component code was generated or matched from cache.
- No WebSocket plumbing. The renderer connects to the live channel on its own; you didn’t open a socket.
- No state management. The server holds render state. You called
ggui_consumeand got events. - No SDK lock-in. Everything above is plain MCP tool calls — works from any MCP client.
That’s the protocol. The OSS ggui serve running locally (ws://127.0.0.1:6781/ws) is the reference implementation; a hosted endpoint at mcp.ggui.ai (wss://mcp.ggui.ai/ws) is coming soon — both speak the same wire.
- Build something — OSS Quick Start (local); the Hosted Quick Start with
mcp.ggui.aiis coming soon - See the wire — MCP Protocol, Envelopes, WebSocket
- Look up a term — Glossary
- Look at example agents — Claude, OpenAI, Gemini, raw MCP
- Already shipped a SaaS? — Agentic App Builders covers the (in-design) path to make an existing app agent-drivable.