Supplier Bot V2 — Tool-Calling Integration Spec

Overview

This document specifies the tool-calling interface contract between Eric's supplier bot V2 pipeline and Sourcy's production infrastructure. It covers: what tools fire, what signals they produce, the exact API format, grounding guarantees, and what Sourcy needs to build to consume the output.

Bot Tools

data capture

Summary Tools

signal emission

Benchmark Score

90.7%

13 cases, strict grounding

Model

Gemini 3.1

Pro Preview

1. Architecture: Two Agents, Two Tool Sets

The system has two LLM agents that run sequentially. The conversation bot talks to suppliers in real-time and captures data via tools. The summary agent analyzes the completed conversation and emits operational signals.

Key principle The conversation bot NEVER stops for signals. It captures data silently via tools while continuing the conversation. Signal routing (escalation, scheduling, asset requests) is the summary agent's job, running after the conversation ends.

Data Flow

// 1. Conversation bot runs, produces:
{
  history: [/* bot ↔ supplier message array */],
  botToolTrace: [/* structured data captures */],
  status: "completed" | "no_reply" | "rejected" | "wechat_redirect"
}

// 2. Summary agent reads the outcome, produces:
{
  signals: [/* operational signals for brain/ops */],
  brainReceipt: {/* confirmation from brain endpoint */}
}

2. Conversation Bot Tools (4)

These are data-capture tools. They write to an in-memory array (botToolTrace) that is returned as part of the conversation outcome. They do not call external APIs. Sourcy reads botToolTrace from the outcome object.

Tool	Purpose	Status	Evidence
`log_data`	Record a sourcing datapoint (MOQ, price, lead time, etc.)	LIVE	Fires in 10/10 benchmark cases. All grounded.
`note_blocker`	Record a goal blocker (MOQ mismatch, spec refusal)	LIVE	Fires in 3/10 applicable cases. All grounded.
`note_file_request`	Record supplier asking for buyer file (logo, design)	LIVE	Fires in 6/10 applicable cases. All grounded.
`acknowledge_media`	Record supplier sending images/files	TEXT TOKENS ONLY	Fires on [图片] tokens. Real multimodal (vision) not wired.

log_data — Schema

{
  "turn": 2,
  "name": "log_data",
  "args": {
    "goalId": "moq",          // matches goal taxonomy
    "value": "300",           // extracted value
    "supplierQuote": "300起"  // exact supplier wording
  }
}

note_blocker — Schema

{
  "turn": 4,
  "name": "note_blocker",
  "args": {
    "goalId": "customization",
    "blockerType": "quantity_too_low",  // moq_mismatch | spec_refusal | authority_boundary | price_dead_end | quantity_too_low
    "reason": "50个数量太少，无法做胶标Logo",
    "supplierQuote": "这么少，做不了"
  }
}

note_file_request — Schema

{
  "turn": 5,
  "name": "note_file_request",
  "args": {
    "assetType": "logo",       // logo | design | spec_sheet | sample_photo | other
    "urgency": "blocking_quote", // blocking_quote | nice_to_have
    "supplierQuote": "要不你们提供给我们？"
  }
}

acknowledge_media — Schema

{
  "turn": 3,
  "name": "acknowledge_media",
  "args": {
    "mediaType": "image",   // image | file
    "description": "产品图和包装图",
    "supplierQuote": "[图片] 这是我们的产品图和包装图"
  }
}

3. Summary Agent Tools (3)

These tools emit operational signals that need a downstream consumer. Currently wired to a local JSON file (dummy-brain-endpoint.js). Sourcy must replace this with their brain/ops API.

Tool	Purpose	Output Signal Type	Status
`emit_signal`	Asset requests, manual review needs	`asset_request`, `manual_review`	DUMMY ENDPOINT
`extract_timing`	Follow-up scheduling with delay estimate	`schedule_follow_up`	DUMMY ENDPOINT
`flag_escalation`	Blocked goals needing human/brain attention	`human_escalation`	DUMMY ENDPOINT

Integration required Sourcy must implement a receiveSignals(signals) endpoint that accepts the signal array below. The current dummy writes to a local JSON file. The interface is simple but has never been tested against a real backend.

Signal Schema — human_escalation

{
  "sessionId": "mes2-01-moq-rejection",
  "supplierId": "supplier-123",
  "supplierName": "常箱伴皮具源头工厂",
  "source": "summary-agent-llm",
  "timestamp": "2026-04-02T10:04:56.947Z",
  "signalType": "human_escalation",
  "severity": "medium",             // low | medium | high
  "reason": "quantity_too_low: Buyer needs to increase order quantity.",
  "goalId": "customization",
  "blockerType": "quantity_too_low",
  "supplierQuote": "这么少，做不了",
  "suggestedUnlock": "Buyer increases quantity to 300+"
}

Signal Schema — schedule_follow_up

{
  "sessionId": "mes2-08-timing-tomorrow",
  "signalType": "schedule_follow_up",
  "severity": "medium",
  "reason": "Supplier indicated timing: \"明天出去找一下五金\" (~24h).",
  "followUpTiming": {
    "delayHours": 24,
    "rawPhrase": "明天出去找一下五金",
    "confidence": "high"        // high | medium | low
  }
}

Signal Schema — asset_request

{
  "sessionId": "mes2-06-file-request",
  "signalType": "asset_request",
  "severity": "high",
  "reason": "Supplier needs logo file before quoting on customization.",
  "goalId": "customization",
  "supplierQuote": "logo发我一下可以吗？"
}

4. Gemini API Format

The pipeline calls Gemini's generateContent endpoint with function declarations. Sourcy's chatServer must be able to pass through or adapt this format.

Request (tool-calling)

POST /v1beta/models/gemini-3.1-pro-preview:generateContent?key=API_KEY

{
  "systemInstruction": { "parts": [{ "text": "<system prompt>" }] },
  "contents": [
    { "role": "user", "parts": [{ "text": "supplier message" }] },
    { "role": "model", "parts": [{ "text": "bot reply" }] }
  ],
  "tools": [{
    "functionDeclarations": [
      { "name": "log_data", "description": "...", "parameters": {/* JSON Schema */} },
      { "name": "note_blocker", /* ... */ },
      { "name": "note_file_request", /* ... */ },
      { "name": "acknowledge_media", /* ... */ }
    ]
  }],
  "generationConfig": {
    "maxOutputTokens": 4000,
    "temperature": 0.1
  }
}

Response (with tool calls)

{
  "candidates": [{
    "content": {
      "parts": [
        { "text": "您好！请问起订量是多少？" },
        { "functionCall": {
            "name": "log_data",
            "args": { "goalId": "moq", "value": "300", "supplierQuote": "300起" }
          }
        }
      ]
    }
  }],
  "usageMetadata": { "promptTokenCount": 1840, "candidatesTokenCount": 95 }
}

Format compatibility check needed If Sourcy's chatServer uses OpenAI's API format, the tool-calling response structure differs significantly (OpenAI uses tool_calls array with function.arguments as JSON string, Gemini uses functionCall with args as object). The pipeline's llm.js normalizes both formats internally, but the chatServer must be compatible with whichever provider it calls.

Mastra framework compatibility Sourcy's current supplier bot lives in supplier-comm-mastra (Mastra framework). Eric's pipeline is vanilla Node.js — no framework dependency. Two integration paths: (1) Port Eric's modules into Mastra as tool handlers / middleware, adapting the tool definition format to Mastra's conventions. (2) Run Eric's pipeline as a standalone service that Mastra calls. Either way, verify that Mastra's tool-calling conventions map cleanly to the schemas above. Nelson or Awsaf should confirm which path before starting.

5. Grounding System

Every tool call includes a supplierQuote field. The eval harness verifies that this quote actually appears in the conversation history. Ungrounded tool calls (where the quote can't be found in the transcript) are stripped in strict mode.

Mode	Behavior	Recommended For
`strict`	Strip ungrounded tool calls before scoring/delivery	Production, handoff validation
`report`	Annotate but keep all tool calls	Debugging, development

Grounding Results — Latest Benchmark (Apr 2, 2026)

Case	Bot Trace	Summary	Ungrounded Nature
mes2-01	5/5	2/2	-
mes2-03	4/5	1/1	Quote text diverged in replay
mes2-04	2/4	1/2	Bot logged data from runtime-generated text
mes2-05	3/4	0/0	Composite media token format
mes2-06	5/5	2/2	-
mes2-07	0/0	0/1	Merged two phrases with slash
mes2-08	3/3	1/1	-
mes2-09	3/5	1/1	Runtime-generated turn mismatch
mes2-10	3/4	1/1	Quote from runtime-generated text
mes2-11	4/4	0/0	-

Overall: Bot trace ~82% grounded, Summary ~85% grounded. Ungrounded items are primarily from runtime-replay generating slightly different supplier text than the fixture, not from hallucination. Grounding data shown for original 10 cases (Gemini Pro). Three additional cases (mes2-12 silence, mes2-13 auto-response, mes2-14 wechat_redirect) added Apr 5 — grounding data pending full Gemini Pro rerun.

6. Model Pinning & Migration

Preview model risk The pipeline uses gemini-3.1-pro-preview and gemini-3-flash-preview. These are preview models that Google can change or deprecate. Gemini 3 Pro Preview was deprecated March 9, 2026 with ~2 weeks notice. Function calling behavior has regressed between preview and production releases for other Gemini models.

Component	Current Model	Alias in Code
Conversation bot	`gemini-3.1-pro-preview`	`gemini-pro`
Supplier sim	`gemini-3-flash-preview`	`gemini-flash`
Summary agent	`gemini-3.1-pro-preview`	`gemini-pro`
Judge (eval)	`gemini-3.1-pro-preview`	`gemini-pro`

Migration procedure

When Google releases a new model version:

Update the model string in llm.js GEMINI_MODELS object
Run the full 10-case benchmark: node run-eval-v2.js --backend gemini-pro
Compare aggregate score against the 90.7% baseline
If score drops >5%, investigate per-dimension regressions before deploying

7. Production Wiring — What You Need to Build

pipeline/handle-incoming-message.js is the production entry point. Two stateless functions: handleIncomingMessage() (call once per supplier message) and handleConversationEnd() (call when conversation ends). The eval harness tests the same inner functions.

Usage

const { handleIncomingMessage, handleConversationEnd } = require('./pipeline/handle-incoming-message');

// On each incoming supplier message:
const history = loadFromDB(conversationId);
history.push({ role: 'supplier', content: incomingMessage });

const result = await handleIncomingMessage({ sr, goals, history });
saveToDB(conversationId, result.history, result.toolCalls);
sendToSupplier(result.reply);  // → { reply, toolCalls, status, history }

if (result.status !== 'continue') {
  const summary = await handleConversationEnd({ sr, goals, history, botToolTrace });
  deliverToBrain(summary.signals);  // → { summaryText, signals, toolCalls }
}

Sourcy-owned responsibilities

Responsibility	Detail
State persistence	Load/save `history` + `botToolTrace` between messages. The eval harness holds these in memory; production must use a DB.
"Conversation done" trigger	When to call the summary agent — timeout, `detectStatus()` result, or manual trigger. The eval harness runs summary immediately after its loop ends.
Retry / error recovery	If the Gemini API call fails mid-conversation, how to resume. The eval harness simply breaks and moves to the next case.
Brain endpoint	Replace `dummy-brain-endpoint.js` (local JSON file) with a real signal consumer.
Scheduling execution	The bot emits `schedule_follow_up` signals; Sourcy runs the timers/cron that fire those follow-ups.

Key point: All functions listed above are stateless and independently callable. callBotWithTools takes a history array and returns a reply — it has no memory between calls. The eval harness tests these same functions, so benchmark scores remain valid after integration.

8. Known Gaps & Out-of-Scope

Gap	Severity	Description
Vision pass-through	LOW	Gemini 3.1 Pro is multimodal. Bot already handles image tokens correctly in conversation (acknowledges, doesn't hallucinate). What's missing: passing actual image bytes to Gemini vision for content description. Eval covers the text-token behavior (mes2-05, mes2-09 both PASS); actual vision is a separate user story not yet eval-pinned.
Brain endpoint	HIGH	Summary agent signals write to a local JSON file. Sourcy must implement `receiveSignals()` in their brain/ops system.
Supplier silence & scheduling backend	MEDIUM	Eval case added (mes2-12). Bot detects silence correctly and summary agent emits `schedule_follow_up`. Remaining gap: Sourcy must define the scheduling backend contract — when a run is triggered, what state from previous runs is available, how the next follow-up fires. Awaiting Lokesh/Awsaf specification.
WeChat redirect	LOW	Eval case added (mes2-14). Detection logic exists in `detectStatus()` and is exercised. The bot deflects politely and continues on-platform. Summary agent should emit a low-severity `wechat_redirect` signal — this is a channel-switch escalation, not a goal-blocker escalation.
Voice messages	MEDIUM	Real 1688 voice messages are not handled. The bot may encounter [语音消息] tokens in production.
chatServer format	HIGH	Gemini's `functionCall` format has not been validated against Sourcy's chatServer API expectations.

9. Error Handling & Edge Cases

What happens when things go wrong:

Failure Mode	What Happens	Where Handled
LLM returns malformed tool call (missing required fields)	Tool call logged to trace but with `null` fields. Grounding check may strip it in strict mode.	`conversation-engine.js` — `callBotWithTools()`
`supplierQuote` is empty string	Grounding check treats it as unsupported. Stripped in strict mode.	`run-eval-v2.js` — grounding logic
Gemini API returns no candidates (empty response)	Retry with exponential backoff (3 attempts). If all fail, conversation marked `error`.	`llm.js` — `callGeminiPro()`
Gemini returns text with `<think>` tags (reasoning leak)	`stripThinking()` removes thinking tags before the reply reaches the conversation.	`conversation-engine.js`
Summary agent emits signal for a goal not in the SR	Signal still delivered. The brain should validate `goalId` against the SR's goal list.	Sourcy-side validation
Grounding check: partial quote match	Uses substring match — if the quote appears anywhere in any supplier turn, it's supported. Exact character match, not fuzzy.	`run-eval-v2.js`

10. What Sourcy Keeps vs. Replaces

Component	File	Action
Bot tool definitions	`tool-definitions.js`	KEEP
Conversation engine	`conversation-engine.js`	KEEP or ADAPT
LLM layer	`llm.js`	KEEP or ADAPT
Summary agent	`summary-agent-llm.js`	KEEP
Brain endpoint	`dummy-brain-endpoint.js`	REPLACE
Eval runner	`run-eval-v2.js`	KEEP for validation
Eval rubric	`eval-rubric-dyn-v3.md`	KEEP
Prompt	`dyn-v6.md`	KEEP (production prompt)

10. Running the Eval

Sourcy can validate the full pipeline by running:

# Full 13-case benchmark (strict grounding)
node run-eval-v2.js --backend gemini-pro --grounding strict

# With brain signal delivery
node run-eval-v2.js --backend gemini-pro --send-to-brain

# Single case for debugging
node run-eval-v2.js --case-ids mes2-01-moq-rejection --backend gemini-pro

# Inspect brain signals after run
cat pipeline/output/brain-signals-log.json | python3 -m json.tool

Handoff Readiness Assessment

Conversation bot tools (4): Production-ready. Pure data capture, no external wiring needed. Sourcy reads botToolTrace from the conversation outcome.

Summary agent tools (3): Structurally complete and firing correctly in eval. The downstream signal delivery path is dummy — Sourcy must build the consumer. The signal schema is the interface contract.

Biggest risk: Not whether tools fire — they do. It's whether Sourcy's backend can consume the signals, and whether the Gemini API format is compatible with chatServer. These two integration points have never been tested end-to-end.

Supplier Bot V2 — Integration Spec

Overview

1. Architecture: Two Agents, Two Tool Sets

Data Flow

2. Conversation Bot Tools (4)

log_data — Schema

note_blocker — Schema

note_file_request — Schema

acknowledge_media — Schema

3. Summary Agent Tools (3)

Signal Schema — human_escalation

Signal Schema — schedule_follow_up

Signal Schema — asset_request

4. Gemini API Format

Request (tool-calling)

Response (with tool calls)

5. Grounding System

Grounding Results — Latest Benchmark (Apr 2, 2026)

6. Model Pinning & Migration

Migration procedure

7. Production Wiring — What You Need to Build

Usage

Sourcy-owned responsibilities

8. Known Gaps & Out-of-Scope

9. Error Handling & Edge Cases

10. What Sourcy Keeps vs. Replaces

10. Running the Eval

Handoff Readiness Assessment