---
title: The agent backend
description: How a ggui-aware agent is hosted — @ggui-ai/agent-server (a brand-agnostic Hono backend) plus a thin per-SDK AgentAdapter. The three channels, the user-action doorbell, and what "Zero Agent Code" means.
---

A ggui-aware agent needs an HTTP backend the user's chat client can talk to. You do **not** hand-roll that backend. The OSS reference implementation is **[`@ggui-ai/agent-server`](https://www.npmjs.com/package/@ggui-ai/agent-server)** — a brand-agnostic [Hono](https://hono.dev) server that owns every ggui-coupled host concern — plus a thin **`AgentAdapter`** that maps your LLM SDK's event stream to a normalized message envelope.

The split is the point: the agent-server has **zero LLM-SDK knowledge** and the adapter has **zero ggui awareness**. The protocol lives entirely between the two.

## Three parties

```
        ┌─────────────────────────┐
        │   host / chat client    │   MCP-Apps host: claude.ai, ChatGPT,
        │  (owns the chat UI,     │   or the sample chat app. Forwards
        │   forwards ui/message)  │   ui/message text to the model.
        └───────────┬─────────────┘
              chat   │  (HTTP POST /agent → SSE)
                     ▼
        ┌─────────────────────────┐
        │     agent backend       │   @ggui-ai/agent-server (Hono)
        │  agent-server + a thin  │   + your AgentAdapter (per-SDK glue).
        │     AgentAdapter        │
        └───────────┬─────────────┘
              MCP    │  (Streamable HTTP)
                     ▼
        ┌─────────────────────────┐
        │     ggui MCP server     │   ggui_handshake / ggui_render /
        │  (GguiSessions + state  │   ggui_update / ggui_consume / ggui_emit,
        │   + the iframe runtime) │   plus the iframe runtime served per session.
        └─────────────────────────┘
```

The **iframe** (running [`@ggui-ai/iframe-runtime`](https://www.npmjs.com/package/@ggui-ai/iframe-runtime)) is the rendered surface that lives inside the host. It is not a fourth party — it is the ggui MCP server's UI, mounted in the host.

## Three channels

| Channel  | Between                         | Transport                | Status    | Carries                                                                              |
| -------- | ------------------------------- | ------------------------ | --------- | ------------------------------------------------------------------------------------ |
| **chat** | host ↔ agent backend            | HTTP `POST /agent` → SSE | Mandatory | `{kind: 'chat', prompt, chatId?}` in; a stream of `NormalizedMessage`s out           |
| **MCP**  | agent backend ↔ ggui MCP server | MCP Streamable HTTP      | Mandatory | `ggui_handshake` / `ggui_render` / `ggui_update` / `ggui_consume` / `ggui_emit` calls |
| **live** | ggui MCP server ↔ iframe        | WebSocket (+ fallback)   | Optional  | declared `streamSpec` deliveries (`ggui_emit` fan-out) and `props_update` frames     |

**chat** is how a turn starts. `POST /agent` is one kind-discriminated endpoint: `{kind: 'chat', prompt, chatId?}` opens the SSE stream — the first event is always `chat-allocated`, carrying the server-allocated `chatId`; subsequent frames are `message` events. The same endpoint also accepts `{kind: 'tool-call', name, arguments}` — the iframe-issued `tools/call` relay, answered as plain JSON rather than SSE. `GET /agent?chatId=X` replays the server-authoritative snapshot through the same handler for rehydration (each recorded tool result is re-inlined fresh so the snapshot reflects current server state), and `GET /` serves a small public manifest the frontend reads `sandboxProxyUrl` from.

**MCP** is the agent loop — the adapter's LLM calls `ggui_render` / `ggui_consume` / etc. against the ggui MCP server using the URL + bearer the library threads on every call.

**live** is the first-party fast path for streaming UI updates into the iframe over WebSocket (gated by a `wsToken`). The spec-compliant cross-host fallback is **tool-result inlining**: agent-server's interceptor reads `_meta.ui.resourceUri` on each tool result, issues a `resources/read`, and inlines the iframe HTML under `_meta.ui.resource` — so the iframe mounts on the first SSE frame with no extra round-trip.

## What agent-server owns

The library owns **every ggui-coupled host concern**, and nothing else:

- HTTP routing (Hono) + SSE streaming
- MCP discovery / routing + bearer threading (`bearer` defaults to `GGUI_MCP_BEARER`, then `dev` — pairing with `ggui serve --dev-allow-all`)
- **Tool-result resource inlining** (`interceptToolResult`) — mounting iframes from `_meta.ui.resourceUri`
- Server-allocated chat ids (`mintChatId`) — the frontend never mints ids client-side
- Guest + bearer **auth** with chat-ownership gating
- The second-origin **sandbox proxy** boot (per the MCP Apps spec; defaults to `port + 1000`)
- The snapshot / rehydration path
- **Cross-framework tool identity**: with `crossFramework` on (the `startAgentServer` default), the library declares each tool's canonical `serverInfo` to ggui once per process via `ggui_runtime_declare_tool_catalog`, so blueprint reuse stays identity-stable across agent frameworks

Crucially, it is a **pure prompt forwarder**: the prompt is fed to the adapter verbatim, and the server synthesizes no directive and special-cases no key. The user-gesture directive that tells the model to call `ggui_consume` is authored in the iframe's `ui/message` text and passes straight through (see [the user-action flow](#the-user-action-flow) below).

## What the AgentAdapter implements

A thin per-SDK adapter implements **one** async-iterable method:

```ts
import { startAgentServer, type AgentAdapter } from "@ggui-ai/agent-server";

const adapter: AgentAdapter = {
  name: "my-sdk",
  async *run(input) {
    // input.prompt        — the string the LLM should see (verbatim)
    // input.chatId        — server-allocated stable id for this conversation
    // input.mcpServers    — { name → { url, bearer } } map (e.g. { ggui: {…} })
    // input.systemPrompt  — three-way: undefined = adapter default,
    //                       null = explicitly none, string = override
    // input.abortSignal   — fires on client disconnect; stop the LLM call
    // input.agentCapabilities — canonical tool catalog (from live MCP
    //                       initialize + tools/list) to stamp into the
    //                       handshake's blueprintDraft contract
    //
    // Drive your SDK's tool loop and yield NormalizedMessage values:
    //   assistant text · tool_use · tool_result (carrying the full MCP
    //   CallToolResult as `tool_use_result`) · result
  },
};

await startAgentServer({
  port: 6790,
  mcpServers: { ggui: { url: "http://localhost:6781/mcp" } }, // a `ggui` entry is required
  adapter,
  // optional: auth (default createGuestTokenAuth()), sandboxProxyPort
  // (default port + 1000), systemPrompt, bearer, chatStore, crossFramework
});
```

Adapters **must stay brand-agnostic**: no imports of `@ggui-ai/protocol/integrations/mcp-apps`, no awareness of `sessionId` / `host-session` / `_meta.ui` keys. The adapter maps its native SDK event stream onto the `NormalizedMessage` envelope; ggui mechanics stay in agent-server. Reference adapters ship for the Claude Agent SDK (`claude-agent-sdk`), the OpenAI Agents SDK (`openai-agents-sdk`), and Google ADK (`google-adk`).

## Frontend pairing — `@ggui-ai/react/chat-helpers`

On the browser side, agent-server pairs with the **`useMcpAppsChat`** hook from [`@ggui-ai/react/chat-helpers`](/sdk/react/). It:

- opens the SSE stream to `POST /agent` and parses `chat-allocated` then `message` frames into one wire;
- walks each tool result's `tool_use_result` for `_meta.ui.resourceUri` (and any inlined `_meta.ui.resource`) and surfaces the result as `sessions` entries your app mounts with `<AppRenderer>` (imported directly from `@mcp-ui/client` — ggui doesn't wrap or re-export it);
- replays `GET /agent?chatId` through the same pipeline for rehydration;
- runs the guest-token client flow (`POST /auth/guest` → store → `Bearer` on every request → retry once on `401`);
- forwards an iframe `ui/message`'s text as the next prompt via `handleAppMessage` and carries its `_meta` **opaquely** as `data.meta` — it never reads a key inside.

## The user-action flow

When a user interacts with a rendered UI, the gesture travels back to the agent through the **GguiSession's pending-event pipe**, which is the single source of truth:

1. **Gesture in the iframe** → the iframe runtime calls `ggui_runtime_submit_action` via the host's `tools/call` relay (postMessage per the MCP Apps spec; in the sample stack, the host relays it as `POST /agent {kind: 'tool-call'}`).
2. The ggui server **appends the gesture to the GguiSession's pending-event pipe** and returns `{ok, consumerPresent}`.
3. `consumerPresent` is computed by an active-consumer registry: `ggui_consume` registers itself while long-polling, so `submit_action` knows whether a consume loop is currently listening.
4. **If a `ggui_consume` long-poll is listening**, it unblocks in-turn and returns the event `{intent, actionData, uiContext, actionId, firedAt}` to the agent.
5. **If nobody is listening** (`consumerPresent: false` — e.g. the user reloaded the page after the agent's turn ended), the iframe emits a **userAction doorbell** on a `ui/message`. The host forwards the message text to the model, which wakes a fresh turn and calls `ggui_consume({sessionId})` to drain the already-enqueued gesture.

The agent retrieves the gesture **exclusively** via `ggui_consume`, so it fires **exactly once**. The doorbell is a **pure pointer** — `_meta["ai.ggui/userAction"]` (`GguiUserActionMeta`) carries only `{kind: 'user-action', description, sessionId, actionId, submittedAt, intent, nextStep: {tool: 'ggui_consume', args: {sessionId}}}`, never the action payload. Carrying the payload in the doorbell would risk a double-trigger.

:::note[The directive lives in the iframe's text]
The actionable "call `ggui_consume` now" directive is authored entirely by the iframe runtime as the human-readable `ui/message` **text** (XML-tagged, imperative-first — load-bearing wording every host forwards verbatim). The `_meta["ai.ggui/userAction"]` slice is an **optional structured mirror** for ggui-aware programmatic consumers. No part of the loop depends on a server-side parse of it — which is exactly why agent-server can stay a pure forwarder.
:::

## "Zero Agent Code", redefined

[Zero Agent Code](/protocol/overview/) now means an agent builder writes only:

1. **MCP server config** naming the ggui MCP endpoint,
2. a **system prompt** (start from `GGUI_AGENT_SYSTEM_PROMPT`, exported by `@ggui-ai/protocol`), and
3. a few lines wiring a thin **`AgentAdapter`** into `startAgentServer()`.

No polling loops, no event handlers, no protocol parsing, no `sessionId` / `host-session` awareness. All of that lives inside `@ggui-ai/agent-server`. The adapter is intentionally brand-agnostic SDK-mapping glue — not ggui logic.

## Auth posture (Preview)

agent-server ships two `AuthAdapter` implementations:

- **`createGuestTokenAuth` (default)** — stateless signed bearer tokens (signing secret from `GUEST_TOKEN_SECRET`, ephemeral with a warning if omitted) that work across browser / React Native / CLI. Mounts `POST /auth/guest`, `GET /auth/me`, `POST /auth/logout`.
- **`createBearerTokenAuth`** — static operator-configured tokens for sample apps, CI, and small self-hosts. Mounts `GET /auth/me` only.

Every chat row is stamped with an `ownerId`; reads and appends are ownership-gated (`200` owner / `403` other / `404` unknown), overridable via `authorizeChat` for team / org semantics. Richer JWT / JWKS / OAuth + PKCE flows are deferred to a future `@ggui-ai/agent-server-auth-extras` (same `AuthAdapter` contract, no handler rewrites). For the Preview, the bundled guest-token + static-bearer paths are the supported surface.

## See also

- [How ggui works](/how-it-works/) — the handshake → render → interact → consume loop
- [Architecture overview](/architecture/overview/) — the wire pipeline at a glance
- [Event System](/architecture/event-system/) — the pending-event pipe + the consume model
- [React SDK](/sdk/react/) — `useMcpAppsChat` and `<AppRenderer>` on the frontend
- [MCP Protocol reference](/api/mcp-protocol/) — `ggui_render` / `ggui_consume` / `ggui_update` / `ggui_emit`