Reference
Architecture
Concierge is a Cloudflare Worker (Rust → WebAssembly). All persistent state lives in Cloudflare D1 (metadata, payments) and KV (configs, sessions, in-flight buffers). No message content is stored at rest.
Inbound channels
WhatsApp Business
- Transport: Meta Cloud API webhooks at
POST /webhook/whatsapp. - Auth: WhatsApp Embedded Signup OAuth → tenant exchanges code for system-user token, scoped to one phone number ID.
- Tenant lookup: reverse index
wa_phone:{phone_number_id}→ WhatsApp account id → tenant id. - Outbound: Meta Graph API
POST /{phone_number_id}/messageswith the system-user token. - Limits: no message-history endpoint, so post-hoc batching can't reconstruct text from older messages: see Reply Buffer below.
Instagram DMs
- Transport: Meta webhook events, same handler family as WhatsApp.
- Auth: Facebook Login → finds the user's Pages → finds the IG business account on each page → stores per-page access token (AES-256-GCM encrypted with
ENCRYPTION_KEY). - Tenant lookup:
ig_page:{page_id}reverse index → IG account → tenant. - Outbound: Graph API
POST /me/messageswith the page token.
Discord
- Install: OAuth2
scope=bot+applications.commands, permission bitfield76928(SEND_MESSAGES | VIEW_CHANNEL | READ_MESSAGE_HISTORY | ADD_REACTIONS | MANAGE_MESSAGES). Callback at/auth/discord/callbackrecordsguild_id → tenant_idin KV. - Inbound transport: Application Webhook Events at
POST /discord/events.MESSAGE_CREATEevents drive AI auto-reply. - Triggers: per-tenant flags on
DiscordConfig:inbound_mentions(reply when @-mentioned),inbound_channel_ids[](reply to every message in these channels). DMs unsupported with the shared bot. - Interactions:
POST /discord/interactionshandles slash commands (/status,/domains list,/rules list) and buttons (Reply, Approve, Reject, Drop). - Signature verification: Ed25519 over
timestamp + bodyusingDISCORD_PUBLIC_KEY; same scheme for both endpoints. - Outbound: shared bot token (
DISCORD_BOT_TOKENenv secret), POST to/channels/{id}/messagesvia thebotrelaycrate.
- Transport: Cloudflare Email Routing. Every
*.cncg.emailsubdomain gets MX records pointed at the worker. Inbound mail invokes the worker'semailevent handler with the raw RFC 2822 bytes. - Tenant lookup:
email_domain:{domain}KV reverse index. - Routing rules: per-domain ordered list in KV at
email_rules:{tenant}:{domain}. Each rule hasMatchCriteria(from, to, subject, body globs + has_attachment) and anEmailAction(drop, spam, forward_email, forward_discord, ai_reply). - Outbound: Cloudflare Email Service via the
EMAILbinding's structured-message API. Sender domain must be onboarded in the Email Service dashboard. - Reverse aliases: when forwarding, the From header is rewritten to a generated address on the tenant's domain so replies route back through Concierge. Mapping stored in
email_reverse:*with 30-day TTL. - Loop detection: outbound messages carry
X-EmailProxy-Forwarded; inbound messages with that header are rejected.
AI reply pipeline
- Inference binding: Cloudflare Workers AI
AIbinding. Default models:llama-4-scout-17b-16e-instructfor replies,llama-3.1-8b-instruct-fastfor prompt-injection scanning and persona safety classification,@cf/baai/bge-base-en-v1.5for embeddings. Reply and fast models are configurable viaAI_MODEL/AI_FAST_MODELenv vars; the embedding model id is centralized inai::EMBEDDING_MODEL. - Prompt envelope: every system prompt sent to the reply model is wrapped by
prompt::wrap(middle): a fixedPREAMBLEframing the model as a small-business reply assistant, the editable middle (persona + rule prompt), and a fixedPOSTAMBLEwith universal house rules, jailbreak rails, and the handoff sentinel[[HANDOFF]]. Both bookends are constants insrc/prompt.rs; tenant content can never reach the model alone, and admin templates render them verbatim alongside the editable middle so what-you-see is what-the-model-sees. - Persona prompt: tenant-wide. Lives in
PersonaConfig.sourceas one of two variants:Builder(PersonaBuilder)orCustom(String). The builder carries voice archetype, business name + type, city,goal(free text + optional URL),handoff_conditions, catch-phrases, and off-topic boundaries;personas::generaterenders these into the editable middle.PersonaConfig::active_prompt()returns the rendered middle (generated from builder fields or the raw custom string). Curated archetypes live in thepersonasD1 catalog (see below) and are copied into the tenant's builder rather than referenced — the tenant owns and edits their own snapshot. - Reply rules: per-channel
ReplyConfig { enabled, rules: Vec<ReplyRule>, default_rule, wait_seconds }. The pipeline walksrulesin order; first match wins; otherwise the mandatorydefault_rulefires. Each rule has amatcher(StaticText { keywords }for case-insensitive substring orPrompt { description, embedding, threshold }for cosine-similarity intent matching) and aresponse(Canned { text }sent verbatim, orPrompt { text }appended to the persona prompt and run through the LLM). - Embedding step: if any
Promptrule exists, the inbound message is embedded once per delivery and compared viaai::cosineto each rule's pre-computed embedding (computed at rule-save time, stored in the rule alongside the model id). Default threshold is 0.72; tunable per rule. - Persona safety gate: AI replies (
ReplyResponse::Prompt) are blocked unless the tenant's persona isApprovedand its hash hasn't drifted since the last vetting. Canned responses are unaffected. See "Persona safety queue" below. - Final prompt: the system prompt is
prompt::wrap(persona.active_prompt() + "\n\n" + rule_prompt)— preamble + middle + postamble. The chat history (recent(role, content)turns from the conversation session, capped byConversationConfig::max_history_messages) is passed as the prior turns. The latest inbound is the finaluserturn.ai::generate_chat_replyforwards the structured request to the Workers AI binding; an empty history degenerates to a single-turn call. - Handoff sentinel: every model reply is run through
prompt::detect_and_strip_handoffbefore it leaves the worker. Case-insensitive search for[[HANDOFF]]strips the token, and a present token flips the session into the holding-pattern path. See "Conversation sessions & handoff" below. - Injection scan: incoming bodies are truncated to 1000 chars, then a fast classifier checks for instruction-override patterns. Rejected messages skip the entire pipeline (no rule matching, no reply).
- Billing: only the AI reply step deducts a credit; static
Cannedresponses are free, and embeddings/intent matching/safety classification are free. Deduction happens before the AI call (optimistic), restored on any failure path. Operators can grant credits to a specific tenant from the management panel. - Pricing: flat per-AI-reply rate, no tiers. The unit price (in milli-units) for each currency is operator-configurable via the singleton
pricing_configrow and the management panel.
Conversation sessions & handoff
- Session record:
Sessioninsrc/types.rs, persisted to KV atconvsession:{tenant}:{channel}:{sender}. Holds a stableconversation_id,last_inbound_at, a boundedmessages: Vec<(role, content)>, and an optionalhandoff: Option<HandoffState>sub-record. KV namespaceconvsession:is distinct from the authsession:prefix. - Idle gap: resolved per-tenant via
ConversationConfig::idle_gap_mins(defaultprompt::DEFAULT_CONVERSATION_IDLE_GAP_MINS= 6 h). On inbound, ifnow - last_inbound_at > idle_gap, the pipeline mints a freshconversation_id, clearsmessages, dropshandoff, and writes the new session. - History cap:
ConversationConfig::max_history_messages(default 20) boundsmessages. Inbound and post-send assistant turns are appended; queued draft replies are not — a rejected draft can't poison the next AI call. - Handoff cooldown:
ConversationConfig::handoff_cooldown_mins(default 60 min). When the model emits[[HANDOFF]], the pipeline strips the token, setshandoff = Some(HandoffState { since }), switches the next turns toprompt::HOLDING_PATTERN_MIDDLE(replaces the persona for cooldown duration), and pages the tenant once. Past the cooldown the worker stays silent until the idle gap fires. - Tenant page:
src/escalations.rsdispatches one notification per handoff — Discord embed viaDiscordBotand/or immediate email viasend_outbound(not the digest cron). Channel choice mirrors the existingApprovalNotificationConfigthe tenant has already set. - Stamping
messages: the unified D1messagestable gained a nullableconversation_idcolumn + indexidx_messages_conversation. AI flows stamp it on outbound rows at enqueue time and update it on the inbound row once the conversation is resolved. Canned-only flows leave it NULL — those threads don't go through the conversation/handoff machine.ConversationContext(the approval-relay record atconv:{id}) gained aconversation_idfield so the post-approval send paths (Discord button + web admin) stamp it back through. - Tunable card:
/dashboard/settingsrenders a Conversation Timing card mapping the three knobs to optionalu32fields. Empty input = use the in-code default; placeholders + inline "Default: N" hints surface the runtime fallback.PUT /dashboard/settings/conversationvalidates bounds (5..=1440 mins for both gaps, 1..=200 turns for history) and the cross-field invariantidle_gap > handoff_cooldown; errors render as inline toasts via HTMX swap.
Live demo chat
- Endpoints:
GET /demo/personasreturns the safety-Approved archetypes from thepersonasD1 catalog (with sample business fields for the picker),POST /demo/chatruns a single multi-turn AI call against the picked persona and the visitor-supplied history. - Statelessness: the modal keeps history client-side and posts it on every turn. Handoff state round-trips as
{handoff: bool}on the wire and an Alpine flag in the modal; persona switch and modal close reset it. No KV session. - Same envelope, same gates: the handler runs the picked persona through
prompt::wrap, gates on the same safety classifier, and respects the same[[HANDOFF]]sentinel as production. The "View prompt" panel renders the exact wrapped envelope so visitors can inspect what the model receives. - Reframing: copy makes clear the visitor is roleplaying as a customer of the picked sample business; real customer messages arrive on WhatsApp / Instagram / Discord / email and never on this chat box.
Persona safety queue
- Trigger: the admin persona handler (
POST /dashboard/persona) computessha256(active_prompt())on save; if it differs fromsafety.checked_prompt_hash, it setssafety.status = Pendingand sends aSafetyJob { tenant_id, prompt_hash }onto theSAFETY_QUEUEproducer binding. Saves that don't change the active prompt skip enqueue. - Consumer:
#[event(queue)]insrc/lib.rsdispatches tosafety_queue::handle_batch. Each job re-reads the persona, drops the job if the prompt hash has drifted (a newer save has already enqueued), runssafety::classify_personaagainst the fast model, and writesApprovedorRejected { vague_reason }back to KV withchecked_prompt_hashandchecked_at. - Classifier: system prompt enumerates Calculon Tech's content policy (no incitement, harassment, discrimination, sexualization of minors, self-harm, illegal-activity promotion, unconsented impersonation). The model returns strict JSON
{"verdict":"approve"|"reject","category":"..."}. Categories are logged for abuse review but never echoed; the user-facing rejection text comes from a fixed mapping insafety::vague_reason_forso users can't iterate prompts against the classifier. - Failure mode: classifier or KV failures call
message.retry(); the queue's DLQ policy (3 retries, thenconcierge-safety-dlq) takes over. While the persona staysPending, AI replies are blocked but canned default rules still send. - Bindings: producer + consumer for
concierge-safety, DLQconcierge-safety-dlq. Both queues must exist before deploy: see Deploy.
Localization
- Locale model: every tenant carries a BCP-47 tag (
Tenant.locale, e.g.en-IN,en-US) and an optional independentcurrencyoverride.src/locale.rs::Localebundles the two into a single value carried through templates and handlers, replacing the previous tangle ofif currency == "INR"branches. - Resolution chain (first hit wins): tenant-stored locale →
Accept-Languageheader (parsed via theaccept-languagecrate, intersected with the supported set) →cf-ipcountrymapping (IN→en-IN, defaulten-US) → hardcodeden-IN. Set once at signup; admin-overrideable from/dashboard/settings/currency. - Number / currency formatting:
helpers::format_countandhelpers::format_moneyuseicu::decimal::FixedDecimalFormatter(icucrate,compiled_datafeature).en-INrenders1,00,000(lakh / crore grouping);en-USrenders100,000. INR shows whole rupees with the ₹ symbol; USD shows two decimals with$. - Translation:
fluent-bundlewith FTL files atassets/locales/{tag}/messages.ftl, baked in at build time viainclude_str!.src/i18n.rsexposes aOnceLock-backedTranslatorandt(locale, key)sugar. Lookup falls back to the canonicalen-INbundle, then to the literal key (so a missed key is loud in the rendered HTML and caught by template tests). - Adding a locale: drop a new FTL file under
assets/locales/{tag}/, add the tag toTranslator::newandLocale::from_request's match arms, and register it inlocale::parse_supported. CLDR data for the new locale is shipped automatically via thecompiled_datafeature. - Out of scope: AI-generated reply content stays English. Per-language persona prompts and a classifier model that handles target languages well are deferred: see the persona safety queue notes above.
Reply buffer (Durable Object)
- Class:
ReplyBufferDOinsrc/durable_objects/reply_buffer.rs; bindingREPLY_BUFFER. - Keying: one DO instance per
{tenant_id}:{channel}:{sender}conversation. - Sliding window: each push appends to a pending list and resets the alarm to
now + wait_seconds. Bursts collapse into one alarm fire. - Drop-after-send: the alarm handler clears DO storage before calling the LLM. Bodies live in DO state for ≤ wait_seconds (5s default), then gone.
- Bypass:
wait_seconds = 0on the channel'sAutoReplyConfigskips the buffer for instant replies.
Approval relay
- Discord: AI drafts post to the tenant's approval channel as embeds with Approve/Reject buttons. Button click triggers
/discord/interactions→ component handler → outbound send via the originating channel adapter. - Conversation context: stored in KV at
conv:{id}with 7-day TTL, holds the Discord message id and origin channel/sender so the reply routes back correctly. - Email: approval-by-email digest sent at the tenant's configured cadence (default 15 min); links contain signed tokens for one-click approve/reject.
Storage layout
D1 tables
tenants: id, email (UNIQUE), facebook_id, plan, currency.messages: unified inbound/outbound metadata (channel, direction, sender, recipient, action_taken,conversation_id). No body content. Indexed on tenant+time, channel+tenant+time, channel_account_id, andconversation_id+time so a conversation thread can be reconstructed for audit. AI flows stampconversation_id; canned-only flows leave it NULL.whatsapp_messages,instagram_messages,email_messages,email_metrics: channel-specific logs.tenant_billing: credit ledger as JSON (entries with optional expiry).payments: Razorpay event log, kept for dispute and tax records.audit_log: management-action history.personas: curated archetype catalog (slug, label, description,source_json= serializedPersonaSource, greeting,is_system,safety_status,safety_checked_at,safety_vague_reason). Edited from/manage/personas; consumed by the public demo persona picker and snapshotted into a tenant's onboarding state when they pick one. Every edit resetssafety_statustodraftand enqueues aSafetyJob; onlyapprovedrows are visible to demo and onboarding. Theconciergerow isis_system=1(undeletable) and powers the homepage demo.
KV keys
session:*,csrf:*: auth cookies (TTL 7d).whatsapp:{id},instagram:{id}: per-resource configs. Channel records embed their ownReplyConfig(rules + default rule + wait_seconds).tenant:{tenant}:whatsapp:{id}etc.: per-tenant indexes (empty values; existence is the index).wa_phone:*,ig_page:*,email_domain:*: webhook → tenant reverse indexes.email_domains:{tenant},email_rules:{tenant}:{domain},email_reverse:*: email config + alias mapping.discord_guild:{guild_id},discord_config:{tenant}: guild ↔ tenant.onboarding:{tenant}: wizard state. Holds thePersonaConfig(source variant + safety status),default_wait_secondsapplied to newly connected channels, and theConversationConfigknobs (idle_gap_mins,handoff_cooldown_mins,max_history_messages— eachOption<u32>, falling back toprompt::DEFAULT_*when unset).convsession:{tenant}:{channel}:{sender}: per-customer conversation session. Holdsconversation_id,last_inbound_at, boundedmessageslist, and optional handoff state. Distinct from the authsession:prefix.conv:{id}: approval-relay conversation context (TTL 7d). Holds the originating channel + sender for routing an approved/edited reply back, plus theconversation_idstamped at enqueue time so the post-approval send paths can stamp matching D1 rows.
Auth
- Login: Google OAuth (
/auth/callback) and Facebook Login (/auth/facebook/callback). Same tenant gets linked to both providers if their email matches. - Session: 7-day HttpOnly cookie; CSRF via double-submit cookie checked on every
POST/PUT/DELETEunder/dashboard. - Management panel:
/manage/*protected by Cloudflare Access (verifies theCf-Access-Jwt-Assertionheader against the team's JWKS).
Outbound APIs Concierge calls
- Meta Graph API for WhatsApp + Instagram + Facebook Login.
- Discord REST API (
discord.com/api/v10) for messages, channels, guild lookup. - Razorpay API for orders, subscriptions, payment verification.
- Cloudflare Workers AI binding (no HTTP: direct binding call). Used for reply generation, prompt-injection scan, persona safety classification, and BGE embeddings.
- Cloudflare Queues binding (
SAFETY_QUEUE) for fanning persona safety jobs to the queue consumer.
Limits and known constraints
- Discord DM auto-reply is unsupported with the shared bot: incoming DMs hit the events endpoint with no
guild_id, so we can't attribute them to a tenant. - WhatsApp has no message-history API; the reply buffer relies on its own DO state to reconstruct bursts.
- Cloudflare Email Service requires sender domains to be onboarded in the dashboard before sends from them succeed; new tenant subdomains may need manual onboarding until that step is automated.
- No per-message body storage. If you need a conversation web-view, that's a future feature requiring a schema change and ToS update.