Skip to content

The agent backend

read as .md

A ggui-aware agent needs an HTTP backend the user’s chat client can talk to. You do not hand-roll that backend. The OSS reference implementation is @ggui-ai/agent-server — a brand-agnostic Hono server that owns every ggui-coupled host concern — plus a thin AgentAdapter that maps your LLM SDK’s event stream to a normalized message envelope.

The split is the point: the agent-server has zero LLM-SDK knowledge and the adapter has zero ggui awareness. The protocol lives entirely between the two.

┌─────────────────────────┐
│ host / chat client │ MCP-Apps host: claude.ai, ChatGPT,
│ (owns the chat UI, │ or the sample chat app. Forwards
│ forwards ui/message) │ ui/message text to the model.
└───────────┬─────────────┘
chat │ (HTTP POST /agent → SSE)
┌─────────────────────────┐
│ agent backend │ @ggui-ai/agent-server (Hono)
│ agent-server + a thin │ + your AgentAdapter (per-SDK glue).
│ AgentAdapter │
└───────────┬─────────────┘
MCP │ (Streamable HTTP)
┌─────────────────────────┐
│ ggui MCP server │ ggui_handshake / ggui_render /
│ (GguiSessions + state │ ggui_update / ggui_consume / ggui_emit,
│ + the iframe runtime) │ plus the iframe runtime served per session.
└─────────────────────────┘

The iframe (running @ggui-ai/iframe-runtime) is the rendered surface that lives inside the host. It is not a fourth party — it is the ggui MCP server’s UI, mounted in the host.

ChannelBetweenTransportStatusCarries
chathost ↔ agent backendHTTP POST /agent → SSEMandatory{kind: 'chat', prompt, chatId?} in; a stream of NormalizedMessages out
MCPagent backend ↔ ggui MCP serverMCP Streamable HTTPMandatoryggui_handshake / ggui_render / ggui_update / ggui_consume / ggui_emit calls
liveggui MCP server ↔ iframeWebSocket (+ fallback)Optionaldeclared streamSpec deliveries (ggui_emit fan-out) and props_update frames

chat is how a turn starts. POST /agent is one kind-discriminated endpoint: {kind: 'chat', prompt, chatId?} opens the SSE stream — the first event is always chat-allocated, carrying the server-allocated chatId; subsequent frames are message events. The same endpoint also accepts {kind: 'tool-call', name, arguments} — the iframe-issued tools/call relay, answered as plain JSON rather than SSE. GET /agent?chatId=X replays the server-authoritative snapshot through the same handler for rehydration (each recorded tool result is re-inlined fresh so the snapshot reflects current server state), and GET / serves a small public manifest the frontend reads sandboxProxyUrl from.

MCP is the agent loop — the adapter’s LLM calls ggui_render / ggui_consume / etc. against the ggui MCP server using the URL + bearer the library threads on every call.

live is the first-party fast path for streaming UI updates into the iframe over WebSocket (gated by a wsToken). The spec-compliant cross-host fallback is tool-result inlining: agent-server’s interceptor reads _meta.ui.resourceUri on each tool result, issues a resources/read, and inlines the iframe HTML under _meta.ui.resource — so the iframe mounts on the first SSE frame with no extra round-trip.

The library owns every ggui-coupled host concern, and nothing else:

  • HTTP routing (Hono) + SSE streaming
  • MCP discovery / routing + bearer threading (bearer defaults to GGUI_MCP_BEARER, then dev — pairing with ggui serve --dev-allow-all)
  • Tool-result resource inlining (interceptToolResult) — mounting iframes from _meta.ui.resourceUri
  • Server-allocated chat ids (mintChatId) — the frontend never mints ids client-side
  • Guest + bearer auth with chat-ownership gating
  • The second-origin sandbox proxy boot (per the MCP Apps spec; defaults to port + 1000)
  • The snapshot / rehydration path
  • Cross-framework tool identity: with crossFramework on (the startAgentServer default), the library declares each tool’s canonical serverInfo to ggui once per process via ggui_runtime_declare_tool_catalog, so blueprint reuse stays identity-stable across agent frameworks

Crucially, it is a pure prompt forwarder: the prompt is fed to the adapter verbatim, and the server synthesizes no directive and special-cases no key. The user-gesture directive that tells the model to call ggui_consume is authored in the iframe’s ui/message text and passes straight through (see the user-action flow below).

A thin per-SDK adapter implements one async-iterable method:

import { startAgentServer, type AgentAdapter } from "@ggui-ai/agent-server";
const adapter: AgentAdapter = {
name: "my-sdk",
async *run(input) {
// input.prompt — the string the LLM should see (verbatim)
// input.chatId — server-allocated stable id for this conversation
// input.mcpServers — { name → { url, bearer } } map (e.g. { ggui: {…} })
// input.systemPrompt — three-way: undefined = adapter default,
// null = explicitly none, string = override
// input.abortSignal — fires on client disconnect; stop the LLM call
// input.agentCapabilities — canonical tool catalog (from live MCP
// initialize + tools/list) to stamp into the
// handshake's blueprintDraft contract
//
// Drive your SDK's tool loop and yield NormalizedMessage values:
// assistant text · tool_use · tool_result (carrying the full MCP
// CallToolResult as `tool_use_result`) · result
},
};
await startAgentServer({
port: 6790,
mcpServers: { ggui: { url: "http://localhost:6781/mcp" } }, // a `ggui` entry is required
adapter,
// optional: auth (default createGuestTokenAuth()), sandboxProxyPort
// (default port + 1000), systemPrompt, bearer, chatStore, crossFramework
});

Adapters must stay brand-agnostic: no imports of @ggui-ai/protocol/integrations/mcp-apps, no awareness of sessionId / host-session / _meta.ui keys. The adapter maps its native SDK event stream onto the NormalizedMessage envelope; ggui mechanics stay in agent-server. Reference adapters ship for the Claude Agent SDK (claude-agent-sdk), the OpenAI Agents SDK (openai-agents-sdk), and Google ADK (google-adk).

Frontend pairing — @ggui-ai/react/chat-helpers

Section titled “Frontend pairing — @ggui-ai/react/chat-helpers”

On the browser side, agent-server pairs with the useMcpAppsChat hook from @ggui-ai/react/chat-helpers. It:

  • opens the SSE stream to POST /agent and parses chat-allocated then message frames into one wire;
  • walks each tool result’s tool_use_result for _meta.ui.resourceUri (and any inlined _meta.ui.resource) and surfaces the result as sessions entries your app mounts with <AppRenderer> (imported directly from @mcp-ui/client — ggui doesn’t wrap or re-export it);
  • replays GET /agent?chatId through the same pipeline for rehydration;
  • runs the guest-token client flow (POST /auth/guest → store → Bearer on every request → retry once on 401);
  • forwards an iframe ui/message’s text as the next prompt via handleAppMessage and carries its _meta opaquely as data.meta — it never reads a key inside.

When a user interacts with a rendered UI, the gesture travels back to the agent through the GguiSession’s pending-event pipe, which is the single source of truth:

  1. Gesture in the iframe → the iframe runtime calls ggui_runtime_submit_action via the host’s tools/call relay (postMessage per the MCP Apps spec; in the sample stack, the host relays it as POST /agent {kind: 'tool-call'}).
  2. The ggui server appends the gesture to the GguiSession’s pending-event pipe and returns {ok, consumerPresent}.
  3. consumerPresent is computed by an active-consumer registry: ggui_consume registers itself while long-polling, so submit_action knows whether a consume loop is currently listening.
  4. If a ggui_consume long-poll is listening, it unblocks in-turn and returns the event {intent, actionData, uiContext, actionId, firedAt} to the agent.
  5. If nobody is listening (consumerPresent: false — e.g. the user reloaded the page after the agent’s turn ended), the iframe emits a userAction doorbell on a ui/message. The host forwards the message text to the model, which wakes a fresh turn and calls ggui_consume({sessionId}) to drain the already-enqueued gesture.

The agent retrieves the gesture exclusively via ggui_consume, so it fires exactly once. The doorbell is a pure pointer_meta["ai.ggui/userAction"] (GguiUserActionMeta) carries only {kind: 'user-action', description, sessionId, actionId, submittedAt, intent, nextStep: {tool: 'ggui_consume', args: {sessionId}}}, never the action payload. Carrying the payload in the doorbell would risk a double-trigger.

Zero Agent Code now means an agent builder writes only:

  1. MCP server config naming the ggui MCP endpoint,
  2. a system prompt (start from GGUI_AGENT_SYSTEM_PROMPT, exported by @ggui-ai/protocol), and
  3. a few lines wiring a thin AgentAdapter into startAgentServer().

No polling loops, no event handlers, no protocol parsing, no sessionId / host-session awareness. All of that lives inside @ggui-ai/agent-server. The adapter is intentionally brand-agnostic SDK-mapping glue — not ggui logic.

agent-server ships two AuthAdapter implementations:

  • createGuestTokenAuth (default) — stateless signed bearer tokens (signing secret from GUEST_TOKEN_SECRET, ephemeral with a warning if omitted) that work across browser / React Native / CLI. Mounts POST /auth/guest, GET /auth/me, POST /auth/logout.
  • createBearerTokenAuth — static operator-configured tokens for sample apps, CI, and small self-hosts. Mounts GET /auth/me only.

Every chat row is stamped with an ownerId; reads and appends are ownership-gated (200 owner / 403 other / 404 unknown), overridable via authorizeChat for team / org semantics. Richer JWT / JWKS / OAuth + PKCE flows are deferred to a future @ggui-ai/agent-server-auth-extras (same AuthAdapter contract, no handler rewrites). For the Preview, the bundled guest-token + static-bearer paths are the supported surface.