Gemini Agent
read as.md Wire ggui to Gemini’s function-calling API so the model can mint a UI mid-conversation, wait for the user to submit, and continue with the structured data. The pattern is a thin bridge: connect to ggui’s MCP server with the official @modelcontextprotocol/sdk client, then surface every MCP tool as a Gemini FunctionDeclaration.
npm install @google/genai @modelcontextprotocol/sdkexport GEMINI_API_KEY="AIza..."import { GoogleGenAI } from "@google/genai";import { Client } from "@modelcontextprotocol/sdk/client/index.js";import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });
// 1. Connect to ggui over MCP (Streamable HTTP transport). `Bearer dev`// authenticates because `ggui serve --dev-allow-all` accepts any bearer —// local dev only.const mcpClient = new Client({ name: "gemini-ggui-agent", version: "0.1.0" }, {});
await mcpClient.connect( new StreamableHTTPClientTransport(new URL("http://127.0.0.1:6781/mcp"), { requestInit: { headers: { Authorization: "Bearer dev" }, }, }));
// 2. Bridge MCP tools → Gemini function declarations.const { tools: mcpTools } = await mcpClient.listTools();const geminiTools = [ { functionDeclarations: mcpTools.map((t) => ({ name: t.name, description: t.description, // Gemini takes JSON Schema as-is via parametersJsonSchema (NOT `parameters`). parametersJsonSchema: t.inputSchema, })), },];
// 3. Open a chat. `chats.create` keeps multi-turn state — reuse this handle.const chat = ai.chats.create({ model: "gemini-3.5-flash", config: { tools: geminiTools, systemInstruction: "You drive ggui MCP tools to render interactive UIs. Call the appropriate tool when you need to collect structured data from the user, then continue with their response.", },});
async function run(userPrompt: string) { console.log(`\nUser: ${userPrompt}`);
let response = await chat.sendMessage({ message: userPrompt });
// 4. Drain function calls until the model returns plain text. while (response.functionCalls?.length) { const functionResponses = await Promise.all( response.functionCalls.map(async (fc) => { const result = await mcpClient.callTool({ name: fc.name, arguments: fc.args, }); return { name: fc.name, response: { content: result.content } }; }) );
response = await chat.sendMessage({ message: functionResponses.map((fr) => ({ functionResponse: fr })), }); }
console.log(`\nAssistant: ${response.text ?? "(no response)"}`);}
await run("I need to schedule a meeting with my team for next week");
await mcpClient.close();npx tsx gemini-agent.tsWhat Happens
Section titled “What Happens”- You ask Gemini to schedule a team meeting.
- Gemini calls
ggui_handshake({intent, blueprintDraft})and gets ahandshakeId+ asuggestion(the server matches a blueprint or synthesizes one). - Gemini calls
ggui_render({handshakeId, props}); the result carries{sessionId, resourceUri}— your host mounts the UI from that MCP-Apps resource. - Gemini calls
ggui_consume({sessionId, timeout}); when the user submits, theeventsarray delivers{intent, actionData, uiContext}. - Gemini replies with plain text.
Gemini-specific notes
Section titled “Gemini-specific notes”- Function schemas use
parametersJsonSchema(JSON Schema passthrough), not the olderparameters(subset-of-OpenAPI) field. MCP tools exposeinputSchemaas JSON Schema — pass it through unchanged. - Tools are grouped under
functionDeclarations; tool replies arefunctionResponseparts inside themessage: PartListUnion. - Multi-turn state lives on the
ai.chats.create()handle — reusechatacross turns. Don’t callmodels.generateContentfor chat flows or you lose history. - Tool-loop latency tip: set
generateContentConfig.thinkingConfig.thinkingLeveltoMINIMAL— high thinking levels add tens of seconds per tool turn. - The bridge is generic: every tool ggui exposes over MCP becomes a Gemini function declaration automatically. No per-tool wrapper code.
- Claude Agent example — same MCP bridge pattern with Anthropic’s SDK (native MCP server support — no manual bridge needed).
- OpenAI Agent example — same flow with OpenAI function calling.
- How it works — the three channels (bootstrap, MCP, WebSocket) and envelope shapes.