Skip to content

Gemini Agent

read as .md

Wire ggui to Gemini’s function-calling API so the model can mint a UI mid-conversation, wait for the user to submit, and continue with the structured data. The pattern is a thin bridge: connect to ggui’s MCP server with the official @modelcontextprotocol/sdk client, then surface every MCP tool as a Gemini FunctionDeclaration.

Terminal window
npm install @google/genai @modelcontextprotocol/sdk
Terminal window
export GEMINI_API_KEY="AIza..."
gemini-agent.ts
import { GoogleGenAI } from "@google/genai";
import { Client } from "@modelcontextprotocol/sdk/client/index.js";
import { StreamableHTTPClientTransport } from "@modelcontextprotocol/sdk/client/streamableHttp.js";
const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });
// 1. Connect to ggui over MCP (Streamable HTTP transport). `Bearer dev`
// authenticates because `ggui serve --dev-allow-all` accepts any bearer —
// local dev only.
const mcpClient = new Client({ name: "gemini-ggui-agent", version: "0.1.0" }, {});
await mcpClient.connect(
new StreamableHTTPClientTransport(new URL("http://127.0.0.1:6781/mcp"), {
requestInit: {
headers: { Authorization: "Bearer dev" },
},
})
);
// 2. Bridge MCP tools → Gemini function declarations.
const { tools: mcpTools } = await mcpClient.listTools();
const geminiTools = [
{
functionDeclarations: mcpTools.map((t) => ({
name: t.name,
description: t.description,
// Gemini takes JSON Schema as-is via parametersJsonSchema (NOT `parameters`).
parametersJsonSchema: t.inputSchema,
})),
},
];
// 3. Open a chat. `chats.create` keeps multi-turn state — reuse this handle.
const chat = ai.chats.create({
model: "gemini-3.5-flash",
config: {
tools: geminiTools,
systemInstruction:
"You drive ggui MCP tools to render interactive UIs. Call the appropriate tool when you need to collect structured data from the user, then continue with their response.",
},
});
async function run(userPrompt: string) {
console.log(`\nUser: ${userPrompt}`);
let response = await chat.sendMessage({ message: userPrompt });
// 4. Drain function calls until the model returns plain text.
while (response.functionCalls?.length) {
const functionResponses = await Promise.all(
response.functionCalls.map(async (fc) => {
const result = await mcpClient.callTool({
name: fc.name,
arguments: fc.args,
});
return { name: fc.name, response: { content: result.content } };
})
);
response = await chat.sendMessage({
message: functionResponses.map((fr) => ({ functionResponse: fr })),
});
}
console.log(`\nAssistant: ${response.text ?? "(no response)"}`);
}
await run("I need to schedule a meeting with my team for next week");
await mcpClient.close();
Terminal window
npx tsx gemini-agent.ts
  1. You ask Gemini to schedule a team meeting.
  2. Gemini calls ggui_handshake({intent, blueprintDraft}) and gets a handshakeId + a suggestion (the server matches a blueprint or synthesizes one).
  3. Gemini calls ggui_render({handshakeId, props}); the result carries {sessionId, resourceUri} — your host mounts the UI from that MCP-Apps resource.
  4. Gemini calls ggui_consume({sessionId, timeout}); when the user submits, the events array delivers {intent, actionData, uiContext}.
  5. Gemini replies with plain text.
  • Function schemas use parametersJsonSchema (JSON Schema passthrough), not the older parameters (subset-of-OpenAPI) field. MCP tools expose inputSchema as JSON Schema — pass it through unchanged.
  • Tools are grouped under functionDeclarations; tool replies are functionResponse parts inside the message: PartListUnion.
  • Multi-turn state lives on the ai.chats.create() handle — reuse chat across turns. Don’t call models.generateContent for chat flows or you lose history.
  • Tool-loop latency tip: set generateContentConfig.thinkingConfig.thinkingLevel to MINIMAL — high thinking levels add tens of seconds per tool turn.
  • The bridge is generic: every tool ggui exposes over MCP becomes a Gemini function declaration automatically. No per-tool wrapper code.
  • Claude Agent example — same MCP bridge pattern with Anthropic’s SDK (native MCP server support — no manual bridge needed).
  • OpenAI Agent example — same flow with OpenAI function calling.
  • How it works — the three channels (bootstrap, MCP, WebSocket) and envelope shapes.