Rate limits

read as .md

ggui rate limiting is operator-configured: a RateLimiter seam the deployment wires (or doesn’t). This page covers the self-hosted defaults, the two enforcement layers and their wire shapes, and how to layer retry on top of whichever MCP host SDK you’re using.

Self-hosted defaults

Default (strict) ggui serve wires no generation limiter — ggui_render is unlimited for paired callers.
ggui serve --public-demo binds a per-remote-IP fixed-window limiter to ggui_render: 30 generations / 10 minutes / IP (operator-pays posture for public demos).
Library users wire their own RateLimiter into the render handler deps — the seam and the typed RateLimitedError live in @ggui-ai/mcp-server-core.

Two enforcement layers

Tool-level: `ggui_render` rejects in-band

When a rate limiter is wired into the render handler, a limited ggui_render call rejects with an MCP tool error (an isError tool result), not an HTTP 429. The error carries the code rate_limited and the retry decision (retryAfterMs). Catch it in your agent loop like any other tool error and back off before re-calling.

HTTP-level: 429 on auth/pairing endpoints

The pairing/login routes enforce limits at the HTTP transport layer. Every limited request returns:

Field	Value
HTTP status	`429`
`Retry-After` header	Seconds before the next attempt is permitted. Optional — absent means use exponential backoff.
Body	JSON `{ "error": { "code": "rate_limited", "message": "...", "retryAfter": <seconds> } }`.

Retry-After is the authoritative signal. When present, honor it verbatim — the server has already computed the appropriate wait. The retryAfter field in the body mirrors the header for convenience when only the body is observable (e.g. some transport wrappers).

Retry is the host SDK’s job

ggui has no first-party client SDK to wrap retries — your MCP host owns that loop. The pattern is the same regardless of host: catch the 429, read Retry-After, sleep, retry, cap attempts. For ggui_render, additionally check the tool result’s isError flag for the in-band rate_limited error and back off the same way.

Claude Agent SDK

The Claude Agent SDK’s query() already retries transient transport errors (including 429) using the standard Anthropic SDK retry config. You generally don’t need to do anything — bursts within the retry window never surface to your code. To tune, pass maxRetries through the SDK’s options. See Examples → Claude Agent for a runnable scaffold.

`@modelcontextprotocol/sdk` (generic MCP)

The official MCP SDK throws on HTTP errors without retrying. Wrap callTool (or whichever method you invoke) yourself:

import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";

const client = new Client({ name: "my-agent", version: "1.0.0" });
await client.connect(
  new StreamableHTTPClientTransport(new URL("http://127.0.0.1:6781/mcp"), {
    requestInit: { headers: { Authorization: "Bearer dev" } },
  })
);

async function callWithRetry<T>(
  fn: () => Promise<T>,
  { maxRetries = 3, baseDelayMs = 1000, maxDelayMs = 30000 } = {}
): Promise<T> {
  for (let attempt = 0; attempt <= maxRetries; attempt++) {
    try {
      return await fn();
    } catch (err) {
      // The MCP SDK surfaces HTTP errors with status + headers attached.
      const status = (err as { status?: number }).status;
      if (status !== 429 || attempt === maxRetries) throw err;

      const retryAfter =
        Number((err as { headers?: Record<string, string> }).headers?.["retry-after"]) || undefined;
      const waitMs =
        retryAfter != null ? retryAfter * 1000 : Math.min(baseDelayMs * 2 ** attempt, maxDelayMs);

      await new Promise((r) => setTimeout(r, waitMs));
    }
  }
  throw new Error("unreachable");
}

const result = await callWithRetry(() =>
  client.callTool({
    name: "ggui_handshake",
    arguments: {
      /* ... */
    },
  })
);

Tune maxRetries per workload: lower on interactive (user-blocking) paths so failures bubble up fast; raise on background batch paths where backoff is cheaper than re-queuing. Note that a rate-limited ggui_render on a --public-demo server does NOT throw an HTTP error — it resolves with isError: true and a rate_limited message; check the result before treating the call as a success.

Raw HTTP

If you’re hitting the server directly without an MCP SDK, implement the same loop against fetch:

Read the Retry-After header on every 429.
If present, sleep that many seconds, then retry.
If absent, sleep min(baseDelay * 2^attempt, maxDelay), then retry.
Cap retries (3–5 is reasonable for interactive workloads, more for batch).
Stop retrying on non-429 4xx (those won’t resolve with backoff).

The generic MCP example walks through raw-HTTP usage end-to-end.