Supplier Bot V2 — Production Handoff

Intelligence Layer for 1688/Alibaba Supplier Outreach

April 5, 2026 · Eric San — for Sourcy

Benchmark Score
90.7%
127/140 · Gemini Pro · 9/10 pass
Eval Dimensions
14
E1–E14 + S1
Eval Cases
13
Scenario suite
Signal Types
3
Escalation · scheduling · asset request

What We’re Delivering

The V2 pipeline is a tool-calling conversation engine that:

  1. Generates tiered goals from a Sourcing Request.
  2. Conducts multi-turn supplier conversations in Chinese on 1688/Alibaba.
  3. Uses four structured tools to capture pricing, MOQ, blockers, media, and file requests during chat.
  4. Runs a post-conversation summary agent that emits three signal types to a downstream brain.
  5. Supports cross-supplier context injection for multi-supplier campaigns.
Scope This is the intelligence layer only. Sourcy owns transport (chatServer), scheduling execution, and the brain endpoint.

Architecture at a Glance

Two-agent design:

AgentModelRoleTools
Conversation botGemini 3.1 ProReal-time capture during supplier chatlog_data, note_blocker, note_file_request, acknowledge_media
Summary agentGemini 3.1 ProPost-conversation signal extractionemit_signal, extract_timing, flag_escalation

Benchmark Results

Scores below reflect the latest hardened benchmark (Gemini Pro on the original ten cases; three additional cases exercised on Claude Sonnet where noted). Pass threshold aligns with the dyn-v3 rubric suite (14 scored dimensions per case).

Case IDCategoryScoreResult
mes2-01Escalation — moq_rejection10/14PASS
mes2-03Escalation — moq_escalation_deep10/14PASS
mes2-04Baseline — authority_boundary12/14PASS
mes2-05Media read — image_burst14/14PASS
mes2-06Media send — file_request11/14PASS
mes2-07Scheduler — timing_calculating11/14PASS
mes2-08Scheduler — timing_tomorrow14/14PASS
mes2-09Media read — image_read_then_file_send12/14PASS
mes2-10Media send — logo_mockup_request13/14PASS
mes2-11Baseline — regression_baseline14/14PASS
mes2-12Scheduler — supplier_silence6/14FAIL Sonnet
mes2-13Auto-response — auto_response_loop13/14PASS
mes2-14Escalation — wechat_redirectPENDING

Grounding (hardened-v2-final, 10-case Gemini Pro run): Conversation bot tool-trace grounding 82.1% (32 supported / 39 trace checks). Summary-agent signal grounding 81.8% (9 supported / 11 tool rows). Figures aggregate strict transcript-alignment checks from the benchmark report.

Signal Pipeline — Proven End-to-End

The send-to-brain integration test ran three cases with signal delivery enabled. Five signals landed as expected:

Each signal carries sessionId, supplierId, severity, reason (with the exact supplier quote where applicable), plus followUpTiming or goalId as appropriate.

Integration gap The brain endpoint is currently a local JSON log sink. Sourcy must implement receiveSignals(signals) as a production API.

What Sourcy Needs to Do

  1. Port the Tier 1 files: dyn-v6.md, conversation-engine.js, tool-definitions.js, summary-agent-llm.js, prompt-builder.js, context-builder.js, llm.js, goal-generator.js.
  2. Implement receiveSignals() on the brain API.
  3. Verify chatServer message format compatibility with tool-call output.
  4. Re-run the 13-case eval benchmark after porting to confirm no regression.
  5. Define scheduling backend contract (when runs trigger, prior state, follow-up execution) — awaiting Lokesh/Awsaf spec.
  6. Set up model monitoring — re-run the benchmark if the Gemini model identifier changes.

Known Gaps & Risk Register

GapSeverityOwnerStatus
Vision pass-throughLowSourcyBot handles image tokens correctly (mes2-05, mes2-09 PASS). Actual image-content description via Gemini vision is a separate user story, not yet eval-pinned.
Brain endpointHighSourcyDummy log → production API
chatServer formatMediumSourcy / TekUntested with full production payload
Preview model stabilityLowSourcyDocument model pinning procedure
1688 connectorMediumTekAlibaba path verified; 1688 probe failed
Scheduling backend contractMediumSourcy (Lokesh/Awsaf)Bot emits schedule_follow_up signals correctly. Sourcy must define: when runs trigger, what state from prior runs is available, how follow-ups fire.

Links & Resources

NameURL
Integration specreport.ericsan.io/sourcy/sourcy_supplier_bot_integration_spec.html
Transcript viewerreport.ericsan.io/sourcy/sourcy_supplier_bot_v2_transcript_viewer.html
Weekly reportreport.ericsan.io/sourcy/sourcy_supplier_bot_v2_weekly_2026_04_02.html
Eval methodologyreport.ericsan.io/sourcy/sourcy_eval_methodology_v1.html
GitHub repogithub.com/neicras/sourcy-supplier-bot-eval

Verdict

The V2 intelligence layer is eval-proven and handoff-ready. 90.7% benchmark on 14 dimensions; the tool-calling pipeline is validated end-to-end, including signal delivery to the downstream brain. Remaining work is integration: port the Tier 1 files, implement the brain API, and validate against chatServer. Eric remains available for eval support and prompt iteration after porting.