Rate limits
read as.md ggui rate limiting is operator-configured: a RateLimiter seam the deployment wires (or doesn’t). This page covers the self-hosted defaults, the two enforcement layers and their wire shapes, and how to layer retry on top of whichever MCP host SDK you’re using.
Self-hosted defaults
Section titled “Self-hosted defaults”- Default (strict)
ggui servewires no generation limiter —ggui_renderis unlimited for paired callers. ggui serve --public-demobinds a per-remote-IP fixed-window limiter toggui_render: 30 generations / 10 minutes / IP (operator-pays posture for public demos).- Library users wire their own
RateLimiterinto the render handler deps — the seam and the typedRateLimitedErrorlive in@ggui-ai/mcp-server-core.
Two enforcement layers
Section titled “Two enforcement layers”Tool-level: ggui_render rejects in-band
Section titled “Tool-level: ggui_render rejects in-band”When a rate limiter is wired into the render handler, a limited ggui_render call rejects with an MCP tool error (an isError tool result), not an HTTP 429. The error carries the code rate_limited and the retry decision (retryAfterMs). Catch it in your agent loop like any other tool error and back off before re-calling.
HTTP-level: 429 on auth/pairing endpoints
Section titled “HTTP-level: 429 on auth/pairing endpoints”The pairing/login routes enforce limits at the HTTP transport layer. Every limited request returns:
| Field | Value |
|---|---|
| HTTP status | 429 |
Retry-After header | Seconds before the next attempt is permitted. Optional — absent means use exponential backoff. |
| Body | JSON { "error": { "code": "rate_limited", "message": "...", "retryAfter": <seconds> } }. |
Retry-After is the authoritative signal. When present, honor it verbatim — the server has already computed the appropriate wait. The retryAfter field in the body mirrors the header for convenience when only the body is observable (e.g. some transport wrappers).
Retry is the host SDK’s job
Section titled “Retry is the host SDK’s job”ggui has no first-party client SDK to wrap retries — your MCP host owns that loop. The pattern is the same regardless of host: catch the 429, read Retry-After, sleep, retry, cap attempts. For ggui_render, additionally check the tool result’s isError flag for the in-band rate_limited error and back off the same way.
Claude Agent SDK
Section titled “Claude Agent SDK”The Claude Agent SDK’s query() already retries transient transport errors (including 429) using the standard Anthropic SDK retry config. You generally don’t need to do anything — bursts within the retry window never surface to your code. To tune, pass maxRetries through the SDK’s options. See Examples → Claude Agent for a runnable scaffold.
@modelcontextprotocol/sdk (generic MCP)
Section titled “@modelcontextprotocol/sdk (generic MCP)”The official MCP SDK throws on HTTP errors without retrying. Wrap callTool (or whichever method you invoke) yourself:
import { Client } from "@modelcontextprotocol/sdk/client/index.js";import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
const client = new Client({ name: "my-agent", version: "1.0.0" });await client.connect( new StreamableHTTPClientTransport(new URL("http://127.0.0.1:6781/mcp"), { requestInit: { headers: { Authorization: "Bearer dev" } }, }));
async function callWithRetry<T>( fn: () => Promise<T>, { maxRetries = 3, baseDelayMs = 1000, maxDelayMs = 30000 } = {}): Promise<T> { for (let attempt = 0; attempt <= maxRetries; attempt++) { try { return await fn(); } catch (err) { // The MCP SDK surfaces HTTP errors with status + headers attached. const status = (err as { status?: number }).status; if (status !== 429 || attempt === maxRetries) throw err;
const retryAfter = Number((err as { headers?: Record<string, string> }).headers?.["retry-after"]) || undefined; const waitMs = retryAfter != null ? retryAfter * 1000 : Math.min(baseDelayMs * 2 ** attempt, maxDelayMs);
await new Promise((r) => setTimeout(r, waitMs)); } } throw new Error("unreachable");}
const result = await callWithRetry(() => client.callTool({ name: "ggui_handshake", arguments: { /* ... */ }, }));Tune maxRetries per workload: lower on interactive (user-blocking) paths so failures bubble up fast; raise on background batch paths where backoff is cheaper than re-queuing. Note that a rate-limited ggui_render on a --public-demo server does NOT throw an HTTP error — it resolves with isError: true and a rate_limited message; check the result before treating the call as a success.
Raw HTTP
Section titled “Raw HTTP”If you’re hitting the server directly without an MCP SDK, implement the same loop against fetch:
- Read the
Retry-Afterheader on every 429. - If present, sleep that many seconds, then retry.
- If absent, sleep
min(baseDelay * 2^attempt, maxDelay), then retry. - Cap retries (3–5 is reasonable for interactive workloads, more for batch).
- Stop retrying on non-429 4xx (those won’t resolve with backoff).
The generic MCP example walks through raw-HTTP usage end-to-end.
See also
Section titled “See also”- Examples → Claude Agent — runnable scaffold with the host SDK’s native retry.
- Cookbook → Error handling — retry, surfacing, and dead-letter patterns.
- Troubleshooting — common error patterns.
- MCP Protocol — full JSON-RPC method reference.