This document specifies the tool-calling interface contract between Eric's supplier bot V2 pipeline and Sourcy's production infrastructure. It covers: what tools fire, what signals they produce, the exact API format, grounding guarantees, and what Sourcy needs to build to consume the output.
The system has two LLM agents that run sequentially. The conversation bot talks to suppliers in real-time and captures data via tools. The summary agent analyzes the completed conversation and emits operational signals.
// 1. Conversation bot runs, produces: { history: [/* bot ↔ supplier message array */], botToolTrace: [/* structured data captures */], status: "completed" | "no_reply" | "rejected" | "wechat_redirect" } // 2. Summary agent reads the outcome, produces: { signals: [/* operational signals for brain/ops */], brainReceipt: {/* confirmation from brain endpoint */} }
These are data-capture tools. They write to an in-memory array (botToolTrace) that is returned as part of the conversation outcome. They do not call external APIs. Sourcy reads botToolTrace from the outcome object.
| Tool | Purpose | Status | Evidence |
|---|---|---|---|
log_data |
Record a sourcing datapoint (MOQ, price, lead time, etc.) | LIVE | Fires in 10/10 benchmark cases. All grounded. |
note_blocker |
Record a goal blocker (MOQ mismatch, spec refusal) | LIVE | Fires in 3/10 applicable cases. All grounded. |
note_file_request |
Record supplier asking for buyer file (logo, design) | LIVE | Fires in 6/10 applicable cases. All grounded. |
acknowledge_media |
Record supplier sending images/files | TEXT TOKENS ONLY | Fires on [图片] tokens. Real multimodal (vision) not wired. |
{
"turn": 2,
"name": "log_data",
"args": {
"goalId": "moq", // matches goal taxonomy
"value": "300", // extracted value
"supplierQuote": "300起" // exact supplier wording
}
}
{
"turn": 4,
"name": "note_blocker",
"args": {
"goalId": "customization",
"blockerType": "quantity_too_low", // moq_mismatch | spec_refusal | authority_boundary | price_dead_end | quantity_too_low
"reason": "50个数量太少,无法做胶标Logo",
"supplierQuote": "这么少,做不了"
}
}
{
"turn": 5,
"name": "note_file_request",
"args": {
"assetType": "logo", // logo | design | spec_sheet | sample_photo | other
"urgency": "blocking_quote", // blocking_quote | nice_to_have
"supplierQuote": "要不你们提供给我们?"
}
}
{
"turn": 3,
"name": "acknowledge_media",
"args": {
"mediaType": "image", // image | file
"description": "产品图和包装图",
"supplierQuote": "[图片] 这是我们的产品图和包装图"
}
}
These tools emit operational signals that need a downstream consumer. Currently wired to a local JSON file (dummy-brain-endpoint.js). Sourcy must replace this with their brain/ops API.
| Tool | Purpose | Output Signal Type | Status |
|---|---|---|---|
emit_signal |
Asset requests, manual review needs | asset_request, manual_review |
DUMMY ENDPOINT |
extract_timing |
Follow-up scheduling with delay estimate | schedule_follow_up |
DUMMY ENDPOINT |
flag_escalation |
Blocked goals needing human/brain attention | human_escalation |
DUMMY ENDPOINT |
receiveSignals(signals) endpoint that accepts the signal array below. The current dummy writes to a local JSON file. The interface is simple but has never been tested against a real backend.
{
"sessionId": "mes2-01-moq-rejection",
"supplierId": "supplier-123",
"supplierName": "常箱伴皮具源头工厂",
"source": "summary-agent-llm",
"timestamp": "2026-04-02T10:04:56.947Z",
"signalType": "human_escalation",
"severity": "medium", // low | medium | high
"reason": "quantity_too_low: Buyer needs to increase order quantity.",
"goalId": "customization",
"blockerType": "quantity_too_low",
"supplierQuote": "这么少,做不了",
"suggestedUnlock": "Buyer increases quantity to 300+"
}
{
"sessionId": "mes2-08-timing-tomorrow",
"signalType": "schedule_follow_up",
"severity": "medium",
"reason": "Supplier indicated timing: \"明天出去找一下五金\" (~24h).",
"followUpTiming": {
"delayHours": 24,
"rawPhrase": "明天出去找一下五金",
"confidence": "high" // high | medium | low
}
}
{
"sessionId": "mes2-06-file-request",
"signalType": "asset_request",
"severity": "high",
"reason": "Supplier needs logo file before quoting on customization.",
"goalId": "customization",
"supplierQuote": "logo发我一下可以吗?"
}
The pipeline calls Gemini's generateContent endpoint with function declarations. Sourcy's chatServer must be able to pass through or adapt this format.
POST /v1beta/models/gemini-3.1-pro-preview:generateContent?key=API_KEY
{
"systemInstruction": { "parts": [{ "text": "<system prompt>" }] },
"contents": [
{ "role": "user", "parts": [{ "text": "supplier message" }] },
{ "role": "model", "parts": [{ "text": "bot reply" }] }
],
"tools": [{
"functionDeclarations": [
{ "name": "log_data", "description": "...", "parameters": {/* JSON Schema */} },
{ "name": "note_blocker", /* ... */ },
{ "name": "note_file_request", /* ... */ },
{ "name": "acknowledge_media", /* ... */ }
]
}],
"generationConfig": {
"maxOutputTokens": 4000,
"temperature": 0.1
}
}
{
"candidates": [{
"content": {
"parts": [
{ "text": "您好!请问起订量是多少?" },
{ "functionCall": {
"name": "log_data",
"args": { "goalId": "moq", "value": "300", "supplierQuote": "300起" }
}
}
]
}
}],
"usageMetadata": { "promptTokenCount": 1840, "candidatesTokenCount": 95 }
}
tool_calls array with function.arguments as JSON string, Gemini uses functionCall with args as object). The pipeline's llm.js normalizes both formats internally, but the chatServer must be compatible with whichever provider it calls.
supplier-comm-mastra (Mastra framework). Eric's pipeline is vanilla Node.js — no framework dependency. Two integration paths: (1) Port Eric's modules into Mastra as tool handlers / middleware, adapting the tool definition format to Mastra's conventions. (2) Run Eric's pipeline as a standalone service that Mastra calls. Either way, verify that Mastra's tool-calling conventions map cleanly to the schemas above. Nelson or Awsaf should confirm which path before starting.
Every tool call includes a supplierQuote field. The eval harness verifies that this quote actually appears in the conversation history. Ungrounded tool calls (where the quote can't be found in the transcript) are stripped in strict mode.
| Mode | Behavior | Recommended For |
|---|---|---|
strict |
Strip ungrounded tool calls before scoring/delivery | Production, handoff validation |
report |
Annotate but keep all tool calls | Debugging, development |
| Case | Bot Trace | Summary | Ungrounded Nature |
|---|---|---|---|
| mes2-01 | 5/5 | 2/2 | - |
| mes2-03 | 4/5 | 1/1 | Quote text diverged in replay |
| mes2-04 | 2/4 | 1/2 | Bot logged data from runtime-generated text |
| mes2-05 | 3/4 | 0/0 | Composite media token format |
| mes2-06 | 5/5 | 2/2 | - |
| mes2-07 | 0/0 | 0/1 | Merged two phrases with slash |
| mes2-08 | 3/3 | 1/1 | - |
| mes2-09 | 3/5 | 1/1 | Runtime-generated turn mismatch |
| mes2-10 | 3/4 | 1/1 | Quote from runtime-generated text |
| mes2-11 | 4/4 | 0/0 | - |
Overall: Bot trace ~82% grounded, Summary ~85% grounded. Ungrounded items are primarily from runtime-replay generating slightly different supplier text than the fixture, not from hallucination. Grounding data shown for original 10 cases (Gemini Pro). Three additional cases (mes2-12 silence, mes2-13 auto-response, mes2-14 wechat_redirect) added Apr 5 — grounding data pending full Gemini Pro rerun.
gemini-3.1-pro-preview and gemini-3-flash-preview. These are preview models that Google can change or deprecate. Gemini 3 Pro Preview was deprecated March 9, 2026 with ~2 weeks notice. Function calling behavior has regressed between preview and production releases for other Gemini models.
| Component | Current Model | Alias in Code |
|---|---|---|
| Conversation bot | gemini-3.1-pro-preview | gemini-pro |
| Supplier sim | gemini-3-flash-preview | gemini-flash |
| Summary agent | gemini-3.1-pro-preview | gemini-pro |
| Judge (eval) | gemini-3.1-pro-preview | gemini-pro |
When Google releases a new model version:
llm.js GEMINI_MODELS objectnode run-eval-v2.js --backend gemini-propipeline/handle-incoming-message.js is the production entry point.
Two stateless functions: handleIncomingMessage() (call once per supplier message) and handleConversationEnd() (call when conversation ends). The eval harness tests the same inner functions.
const { handleIncomingMessage, handleConversationEnd } = require('./pipeline/handle-incoming-message'); // On each incoming supplier message: const history = loadFromDB(conversationId); history.push({ role: 'supplier', content: incomingMessage }); const result = await handleIncomingMessage({ sr, goals, history }); saveToDB(conversationId, result.history, result.toolCalls); sendToSupplier(result.reply); // → { reply, toolCalls, status, history } if (result.status !== 'continue') { const summary = await handleConversationEnd({ sr, goals, history, botToolTrace }); deliverToBrain(summary.signals); // → { summaryText, signals, toolCalls } }
| Responsibility | Detail |
|---|---|
| State persistence | Load/save history + botToolTrace between messages. The eval harness holds these in memory; production must use a DB. |
| "Conversation done" trigger | When to call the summary agent — timeout, detectStatus() result, or manual trigger. The eval harness runs summary immediately after its loop ends. |
| Retry / error recovery | If the Gemini API call fails mid-conversation, how to resume. The eval harness simply breaks and moves to the next case. |
| Brain endpoint | Replace dummy-brain-endpoint.js (local JSON file) with a real signal consumer. |
| Scheduling execution | The bot emits schedule_follow_up signals; Sourcy runs the timers/cron that fire those follow-ups. |
Key point: All functions listed above are stateless and independently callable. callBotWithTools takes a history array and returns a reply — it has no memory between calls. The eval harness tests these same functions, so benchmark scores remain valid after integration.
| Gap | Severity | Description |
|---|---|---|
| Vision pass-through | LOW | Gemini 3.1 Pro is multimodal. Bot already handles image tokens correctly in conversation (acknowledges, doesn't hallucinate). What's missing: passing actual image bytes to Gemini vision for content description. Eval covers the text-token behavior (mes2-05, mes2-09 both PASS); actual vision is a separate user story not yet eval-pinned. |
| Brain endpoint | HIGH | Summary agent signals write to a local JSON file. Sourcy must implement receiveSignals() in their brain/ops system. |
| Supplier silence & scheduling backend | MEDIUM | Eval case added (mes2-12). Bot detects silence correctly and summary agent emits schedule_follow_up. Remaining gap: Sourcy must define the scheduling backend contract — when a run is triggered, what state from previous runs is available, how the next follow-up fires. Awaiting Lokesh/Awsaf specification. |
| WeChat redirect | LOW | Eval case added (mes2-14). Detection logic exists in detectStatus() and is exercised. The bot deflects politely and continues on-platform. Summary agent should emit a low-severity wechat_redirect signal — this is a channel-switch escalation, not a goal-blocker escalation. |
| Voice messages | MEDIUM | Real 1688 voice messages are not handled. The bot may encounter [语音消息] tokens in production. |
| chatServer format | HIGH | Gemini's functionCall format has not been validated against Sourcy's chatServer API expectations. |
What happens when things go wrong:
| Failure Mode | What Happens | Where Handled |
|---|---|---|
| LLM returns malformed tool call (missing required fields) | Tool call logged to trace but with null fields. Grounding check may strip it in strict mode. | conversation-engine.js — callBotWithTools() |
supplierQuote is empty string | Grounding check treats it as unsupported. Stripped in strict mode. | run-eval-v2.js — grounding logic |
| Gemini API returns no candidates (empty response) | Retry with exponential backoff (3 attempts). If all fail, conversation marked error. | llm.js — callGeminiPro() |
Gemini returns text with <think> tags (reasoning leak) | stripThinking() removes thinking tags before the reply reaches the conversation. | conversation-engine.js |
| Summary agent emits signal for a goal not in the SR | Signal still delivered. The brain should validate goalId against the SR's goal list. | Sourcy-side validation |
| Grounding check: partial quote match | Uses substring match — if the quote appears anywhere in any supplier turn, it's supported. Exact character match, not fuzzy. | run-eval-v2.js |
| Component | File | Action |
|---|---|---|
| Bot tool definitions | tool-definitions.js | KEEP |
| Conversation engine | conversation-engine.js | KEEP or ADAPT |
| LLM layer | llm.js | KEEP or ADAPT |
| Summary agent | summary-agent-llm.js | KEEP |
| Brain endpoint | dummy-brain-endpoint.js | REPLACE |
| Eval runner | run-eval-v2.js | KEEP for validation |
| Eval rubric | eval-rubric-dyn-v3.md | KEEP |
| Prompt | dyn-v6.md | KEEP (production prompt) |
Sourcy can validate the full pipeline by running:
# Full 13-case benchmark (strict grounding) node run-eval-v2.js --backend gemini-pro --grounding strict # With brain signal delivery node run-eval-v2.js --backend gemini-pro --send-to-brain # Single case for debugging node run-eval-v2.js --case-ids mes2-01-moq-rejection --backend gemini-pro # Inspect brain signals after run cat pipeline/output/brain-signals-log.json | python3 -m json.tool
Conversation bot tools (4): Production-ready. Pure data capture, no external wiring needed. Sourcy reads botToolTrace from the conversation outcome.
Summary agent tools (3): Structurally complete and firing correctly in eval. The downstream signal delivery path is dummy — Sourcy must build the consumer. The signal schema is the interface contract.
Biggest risk: Not whether tools fire — they do. It's whether Sourcy's backend can consume the signals, and whether the Gemini API format is compatible with chatServer. These two integration points have never been tested end-to-end.