The agent backend
read as.md A ggui-aware agent needs an HTTP backend the user’s chat client can talk to. You do not hand-roll that backend. The OSS reference implementation is @ggui-ai/agent-server — a brand-agnostic Hono server that owns every ggui-coupled host concern — plus a thin AgentAdapter that maps your LLM SDK’s event stream to a normalized message envelope.
The split is the point: the agent-server has zero LLM-SDK knowledge and the adapter has zero ggui awareness. The protocol lives entirely between the two.
Three parties
Section titled “Three parties” ┌─────────────────────────┐ │ host / chat client │ MCP-Apps host: claude.ai, ChatGPT, │ (owns the chat UI, │ or the sample chat app. Forwards │ forwards ui/message) │ ui/message text to the model. └───────────┬─────────────┘ chat │ (HTTP POST /agent → SSE) ▼ ┌─────────────────────────┐ │ agent backend │ @ggui-ai/agent-server (Hono) │ agent-server + a thin │ + your AgentAdapter (per-SDK glue). │ AgentAdapter │ └───────────┬─────────────┘ MCP │ (Streamable HTTP) ▼ ┌─────────────────────────┐ │ ggui MCP server │ ggui_handshake / ggui_render / │ (GguiSessions + state │ ggui_update / ggui_consume / ggui_emit, │ + the iframe runtime) │ plus the iframe runtime served per session. └─────────────────────────┘The iframe (running @ggui-ai/iframe-runtime) is the rendered surface that lives inside the host. It is not a fourth party — it is the ggui MCP server’s UI, mounted in the host.
Three channels
Section titled “Three channels”| Channel | Between | Transport | Status | Carries |
|---|---|---|---|---|
| chat | host ↔ agent backend | HTTP POST /agent → SSE | Mandatory | {kind: 'chat', prompt, chatId?} in; a stream of NormalizedMessages out |
| MCP | agent backend ↔ ggui MCP server | MCP Streamable HTTP | Mandatory | ggui_handshake / ggui_render / ggui_update / ggui_consume / ggui_emit calls |
| live | ggui MCP server ↔ iframe | WebSocket (+ fallback) | Optional | declared streamSpec deliveries (ggui_emit fan-out) and props_update frames |
chat is how a turn starts. POST /agent is one kind-discriminated endpoint: {kind: 'chat', prompt, chatId?} opens the SSE stream — the first event is always chat-allocated, carrying the server-allocated chatId; subsequent frames are message events. The same endpoint also accepts {kind: 'tool-call', name, arguments} — the iframe-issued tools/call relay, answered as plain JSON rather than SSE. GET /agent?chatId=X replays the server-authoritative snapshot through the same handler for rehydration (each recorded tool result is re-inlined fresh so the snapshot reflects current server state), and GET / serves a small public manifest the frontend reads sandboxProxyUrl from.
MCP is the agent loop — the adapter’s LLM calls ggui_render / ggui_consume / etc. against the ggui MCP server using the URL + bearer the library threads on every call.
live is the first-party fast path for streaming UI updates into the iframe over WebSocket (gated by a wsToken). The spec-compliant cross-host fallback is tool-result inlining: agent-server’s interceptor reads _meta.ui.resourceUri on each tool result, issues a resources/read, and inlines the iframe HTML under _meta.ui.resource — so the iframe mounts on the first SSE frame with no extra round-trip.
What agent-server owns
Section titled “What agent-server owns”The library owns every ggui-coupled host concern, and nothing else:
- HTTP routing (Hono) + SSE streaming
- MCP discovery / routing + bearer threading (
bearerdefaults toGGUI_MCP_BEARER, thendev— pairing withggui serve --dev-allow-all) - Tool-result resource inlining (
interceptToolResult) — mounting iframes from_meta.ui.resourceUri - Server-allocated chat ids (
mintChatId) — the frontend never mints ids client-side - Guest + bearer auth with chat-ownership gating
- The second-origin sandbox proxy boot (per the MCP Apps spec; defaults to
port + 1000) - The snapshot / rehydration path
- Cross-framework tool identity: with
crossFrameworkon (thestartAgentServerdefault), the library declares each tool’s canonicalserverInfoto ggui once per process viaggui_runtime_declare_tool_catalog, so blueprint reuse stays identity-stable across agent frameworks
Crucially, it is a pure prompt forwarder: the prompt is fed to the adapter verbatim, and the server synthesizes no directive and special-cases no key. The user-gesture directive that tells the model to call ggui_consume is authored in the iframe’s ui/message text and passes straight through (see the user-action flow below).
What the AgentAdapter implements
Section titled “What the AgentAdapter implements”A thin per-SDK adapter implements one async-iterable method:
import { startAgentServer, type AgentAdapter } from "@ggui-ai/agent-server";
const adapter: AgentAdapter = { name: "my-sdk", async *run(input) { // input.prompt — the string the LLM should see (verbatim) // input.chatId — server-allocated stable id for this conversation // input.mcpServers — { name → { url, bearer } } map (e.g. { ggui: {…} }) // input.systemPrompt — three-way: undefined = adapter default, // null = explicitly none, string = override // input.abortSignal — fires on client disconnect; stop the LLM call // input.agentCapabilities — canonical tool catalog (from live MCP // initialize + tools/list) to stamp into the // handshake's blueprintDraft contract // // Drive your SDK's tool loop and yield NormalizedMessage values: // assistant text · tool_use · tool_result (carrying the full MCP // CallToolResult as `tool_use_result`) · result },};
await startAgentServer({ port: 6790, mcpServers: { ggui: { url: "http://localhost:6781/mcp" } }, // a `ggui` entry is required adapter, // optional: auth (default createGuestTokenAuth()), sandboxProxyPort // (default port + 1000), systemPrompt, bearer, chatStore, crossFramework});Adapters must stay brand-agnostic: no imports of @ggui-ai/protocol/integrations/mcp-apps, no awareness of sessionId / host-session / _meta.ui keys. The adapter maps its native SDK event stream onto the NormalizedMessage envelope; ggui mechanics stay in agent-server. Reference adapters ship for the Claude Agent SDK (claude-agent-sdk), the OpenAI Agents SDK (openai-agents-sdk), and Google ADK (google-adk).
Frontend pairing — @ggui-ai/react/chat-helpers
Section titled “Frontend pairing — @ggui-ai/react/chat-helpers”On the browser side, agent-server pairs with the useMcpAppsChat hook from @ggui-ai/react/chat-helpers. It:
- opens the SSE stream to
POST /agentand parseschat-allocatedthenmessageframes into one wire; - walks each tool result’s
tool_use_resultfor_meta.ui.resourceUri(and any inlined_meta.ui.resource) and surfaces the result assessionsentries your app mounts with<AppRenderer>(imported directly from@mcp-ui/client— ggui doesn’t wrap or re-export it); - replays
GET /agent?chatIdthrough the same pipeline for rehydration; - runs the guest-token client flow (
POST /auth/guest→ store →Beareron every request → retry once on401); - forwards an iframe
ui/message’s text as the next prompt viahandleAppMessageand carries its_metaopaquely asdata.meta— it never reads a key inside.
The user-action flow
Section titled “The user-action flow”When a user interacts with a rendered UI, the gesture travels back to the agent through the GguiSession’s pending-event pipe, which is the single source of truth:
- Gesture in the iframe → the iframe runtime calls
ggui_runtime_submit_actionvia the host’stools/callrelay (postMessage per the MCP Apps spec; in the sample stack, the host relays it asPOST /agent {kind: 'tool-call'}). - The ggui server appends the gesture to the GguiSession’s pending-event pipe and returns
{ok, consumerPresent}. consumerPresentis computed by an active-consumer registry:ggui_consumeregisters itself while long-polling, sosubmit_actionknows whether a consume loop is currently listening.- If a
ggui_consumelong-poll is listening, it unblocks in-turn and returns the event{intent, actionData, uiContext, actionId, firedAt}to the agent. - If nobody is listening (
consumerPresent: false— e.g. the user reloaded the page after the agent’s turn ended), the iframe emits a userAction doorbell on aui/message. The host forwards the message text to the model, which wakes a fresh turn and callsggui_consume({sessionId})to drain the already-enqueued gesture.
The agent retrieves the gesture exclusively via ggui_consume, so it fires exactly once. The doorbell is a pure pointer — _meta["ai.ggui/userAction"] (GguiUserActionMeta) carries only {kind: 'user-action', description, sessionId, actionId, submittedAt, intent, nextStep: {tool: 'ggui_consume', args: {sessionId}}}, never the action payload. Carrying the payload in the doorbell would risk a double-trigger.
”Zero Agent Code”, redefined
Section titled “”Zero Agent Code”, redefined”Zero Agent Code now means an agent builder writes only:
- MCP server config naming the ggui MCP endpoint,
- a system prompt (start from
GGUI_AGENT_SYSTEM_PROMPT, exported by@ggui-ai/protocol), and - a few lines wiring a thin
AgentAdapterintostartAgentServer().
No polling loops, no event handlers, no protocol parsing, no sessionId / host-session awareness. All of that lives inside @ggui-ai/agent-server. The adapter is intentionally brand-agnostic SDK-mapping glue — not ggui logic.
Auth posture (Preview)
Section titled “Auth posture (Preview)”agent-server ships two AuthAdapter implementations:
createGuestTokenAuth(default) — stateless signed bearer tokens (signing secret fromGUEST_TOKEN_SECRET, ephemeral with a warning if omitted) that work across browser / React Native / CLI. MountsPOST /auth/guest,GET /auth/me,POST /auth/logout.createBearerTokenAuth— static operator-configured tokens for sample apps, CI, and small self-hosts. MountsGET /auth/meonly.
Every chat row is stamped with an ownerId; reads and appends are ownership-gated (200 owner / 403 other / 404 unknown), overridable via authorizeChat for team / org semantics. Richer JWT / JWKS / OAuth + PKCE flows are deferred to a future @ggui-ai/agent-server-auth-extras (same AuthAdapter contract, no handler rewrites). For the Preview, the bundled guest-token + static-bearer paths are the supported surface.
See also
Section titled “See also”- How ggui works — the handshake → render → interact → consume loop
- Architecture overview — the wire pipeline at a glance
- Event System — the pending-event pipe + the consume model
- React SDK —
useMcpAppsChatand<AppRenderer>on the frontend - MCP Protocol reference —
ggui_render/ggui_consume/ggui_update/ggui_emit