C Concierge Documentation
Docs / Start here / Architecture
Reference

Architecture

Concierge is a Cloudflare Worker (Rust → WebAssembly). All persistent state lives in Cloudflare D1 (metadata, payments) and KV (configs, sessions, in-flight buffers). No message content is stored at rest.

Inbound channels

WhatsApp Business

  • Transport: Meta Cloud API webhooks at POST /webhook/whatsapp.
  • Auth: WhatsApp Embedded Signup OAuth → tenant exchanges code for system-user token, scoped to one phone number ID.
  • Tenant lookup: reverse index wa_phone:{phone_number_id} → WhatsApp account id → tenant id.
  • Outbound: Meta Graph API POST /{phone_number_id}/messages with the system-user token.
  • Limits: no message-history endpoint, so post-hoc batching can't reconstruct text from older messages: see Reply Buffer below.

Instagram DMs

  • Transport: Meta webhook events, same handler family as WhatsApp.
  • Auth: Facebook Login → finds the user's Pages → finds the IG business account on each page → stores per-page access token (AES-256-GCM encrypted with ENCRYPTION_KEY).
  • Tenant lookup: ig_page:{page_id} reverse index → IG account → tenant.
  • Outbound: Graph API POST /me/messages with the page token.

Discord

  • Install: OAuth2 scope=bot+applications.commands, permission bitfield 76928 (SEND_MESSAGES | VIEW_CHANNEL | READ_MESSAGE_HISTORY | ADD_REACTIONS | MANAGE_MESSAGES). Callback at /auth/discord/callback records guild_id → tenant_id in KV.
  • Inbound transport: Application Webhook Events at POST /discord/events. MESSAGE_CREATE events drive AI auto-reply.
  • Triggers: per-tenant flags on DiscordConfig: inbound_mentions (reply when @-mentioned), inbound_channel_ids[] (reply to every message in these channels). DMs unsupported with the shared bot.
  • Interactions: POST /discord/interactions handles slash commands (/status, /domains list, /rules list) and buttons (Reply, Approve, Reject, Drop).
  • Signature verification: Ed25519 over timestamp + body using DISCORD_PUBLIC_KEY; same scheme for both endpoints.
  • Outbound: shared bot token (DISCORD_BOT_TOKEN env secret), POST to /channels/{id}/messages via the botrelay crate.

Email

  • Transport: Cloudflare Email Routing. Every *.cncg.email subdomain gets MX records pointed at the worker. Inbound mail invokes the worker's email event handler with the raw RFC 2822 bytes.
  • Tenant lookup: email_domain:{domain} KV reverse index.
  • Routing rules: per-domain ordered list in KV at email_rules:{tenant}:{domain}. Each rule has MatchCriteria (from, to, subject, body globs + has_attachment) and an EmailAction (drop, spam, forward_email, forward_discord, ai_reply).
  • Outbound: Cloudflare Email Service via the EMAIL binding's structured-message API. Sender domain must be onboarded in the Email Service dashboard.
  • Reverse aliases: when forwarding, the From header is rewritten to a generated address on the tenant's domain so replies route back through Concierge. Mapping stored in email_reverse:* with 30-day TTL.
  • Loop detection: outbound messages carry X-EmailProxy-Forwarded; inbound messages with that header are rejected.

AI reply pipeline

  • Inference binding: Cloudflare Workers AI AI binding. Default models: llama-4-scout-17b-16e-instruct for replies, llama-3.1-8b-instruct-fast for prompt-injection scanning and persona safety classification, @cf/baai/bge-base-en-v1.5 for embeddings. Reply and fast models are configurable via AI_MODEL / AI_FAST_MODEL env vars; the embedding model id is centralized in ai::EMBEDDING_MODEL.
  • Prompt envelope: every system prompt sent to the reply model is wrapped by prompt::wrap(middle): a fixed PREAMBLE framing the model as a small-business reply assistant, the editable middle (persona + rule prompt), and a fixed POSTAMBLE with universal house rules, jailbreak rails, and the handoff sentinel [[HANDOFF]]. Both bookends are constants in src/prompt.rs; tenant content can never reach the model alone, and admin templates render them verbatim alongside the editable middle so what-you-see is what-the-model-sees.
  • Persona prompt: tenant-wide. Lives in PersonaConfig.source as one of two variants: Builder(PersonaBuilder) or Custom(String). The builder carries voice archetype, business name + type, city, goal (free text + optional URL), handoff_conditions, catch-phrases, and off-topic boundaries; personas::generate renders these into the editable middle. PersonaConfig::active_prompt() returns the rendered middle (generated from builder fields or the raw custom string). Curated archetypes live in the personas D1 catalog (see below) and are copied into the tenant's builder rather than referenced — the tenant owns and edits their own snapshot.
  • Reply rules: per-channel ReplyConfig { enabled, rules: Vec<ReplyRule>, default_rule, wait_seconds }. The pipeline walks rules in order; first match wins; otherwise the mandatory default_rule fires. Each rule has a matcher (StaticText { keywords } for case-insensitive substring or Prompt { description, embedding, threshold } for cosine-similarity intent matching) and a response (Canned { text } sent verbatim, or Prompt { text } appended to the persona prompt and run through the LLM).
  • Embedding step: if any Prompt rule exists, the inbound message is embedded once per delivery and compared via ai::cosine to each rule's pre-computed embedding (computed at rule-save time, stored in the rule alongside the model id). Default threshold is 0.72; tunable per rule.
  • Persona safety gate: AI replies (ReplyResponse::Prompt) are blocked unless the tenant's persona is Approved and its hash hasn't drifted since the last vetting. Canned responses are unaffected. See "Persona safety queue" below.
  • Final prompt: the system prompt is prompt::wrap(persona.active_prompt() + "\n\n" + rule_prompt) — preamble + middle + postamble. The chat history (recent (role, content) turns from the conversation session, capped by ConversationConfig::max_history_messages) is passed as the prior turns. The latest inbound is the final user turn. ai::generate_chat_reply forwards the structured request to the Workers AI binding; an empty history degenerates to a single-turn call.
  • Handoff sentinel: every model reply is run through prompt::detect_and_strip_handoff before it leaves the worker. Case-insensitive search for [[HANDOFF]] strips the token, and a present token flips the session into the holding-pattern path. See "Conversation sessions & handoff" below.
  • Injection scan: incoming bodies are truncated to 1000 chars, then a fast classifier checks for instruction-override patterns. Rejected messages skip the entire pipeline (no rule matching, no reply).
  • Billing: only the AI reply step deducts a credit; static Canned responses are free, and embeddings/intent matching/safety classification are free. Deduction happens before the AI call (optimistic), restored on any failure path. Operators can grant credits to a specific tenant from the management panel.
  • Pricing: flat per-AI-reply rate, no tiers. The unit price (in milli-units) for each currency is operator-configurable via the singleton pricing_config row and the management panel.

Conversation sessions & handoff

  • Session record: Session in src/types.rs, persisted to KV at convsession:{tenant}:{channel}:{sender}. Holds a stable conversation_id, last_inbound_at, a bounded messages: Vec<(role, content)>, and an optional handoff: Option<HandoffState> sub-record. KV namespace convsession: is distinct from the auth session: prefix.
  • Idle gap: resolved per-tenant via ConversationConfig::idle_gap_mins (default prompt::DEFAULT_CONVERSATION_IDLE_GAP_MINS = 6 h). On inbound, if now - last_inbound_at > idle_gap, the pipeline mints a fresh conversation_id, clears messages, drops handoff, and writes the new session.
  • History cap: ConversationConfig::max_history_messages (default 20) bounds messages. Inbound and post-send assistant turns are appended; queued draft replies are not — a rejected draft can't poison the next AI call.
  • Handoff cooldown: ConversationConfig::handoff_cooldown_mins (default 60 min). When the model emits [[HANDOFF]], the pipeline strips the token, sets handoff = Some(HandoffState { since }), switches the next turns to prompt::HOLDING_PATTERN_MIDDLE (replaces the persona for cooldown duration), and pages the tenant once. Past the cooldown the worker stays silent until the idle gap fires.
  • Tenant page: src/escalations.rs dispatches one notification per handoff — Discord embed via DiscordBot and/or immediate email via send_outbound (not the digest cron). Channel choice mirrors the existing ApprovalNotificationConfig the tenant has already set.
  • Stamping messages: the unified D1 messages table gained a nullable conversation_id column + index idx_messages_conversation. AI flows stamp it on outbound rows at enqueue time and update it on the inbound row once the conversation is resolved. Canned-only flows leave it NULL — those threads don't go through the conversation/handoff machine. ConversationContext (the approval-relay record at conv:{id}) gained a conversation_id field so the post-approval send paths (Discord button + web admin) stamp it back through.
  • Tunable card: /dashboard/settings renders a Conversation Timing card mapping the three knobs to optional u32 fields. Empty input = use the in-code default; placeholders + inline "Default: N" hints surface the runtime fallback. PUT /dashboard/settings/conversation validates bounds (5..=1440 mins for both gaps, 1..=200 turns for history) and the cross-field invariant idle_gap > handoff_cooldown; errors render as inline toasts via HTMX swap.

Live demo chat

  • Endpoints: GET /demo/personas returns the safety-Approved archetypes from the personas D1 catalog (with sample business fields for the picker), POST /demo/chat runs a single multi-turn AI call against the picked persona and the visitor-supplied history.
  • Statelessness: the modal keeps history client-side and posts it on every turn. Handoff state round-trips as {handoff: bool} on the wire and an Alpine flag in the modal; persona switch and modal close reset it. No KV session.
  • Same envelope, same gates: the handler runs the picked persona through prompt::wrap, gates on the same safety classifier, and respects the same [[HANDOFF]] sentinel as production. The "View prompt" panel renders the exact wrapped envelope so visitors can inspect what the model receives.
  • Reframing: copy makes clear the visitor is roleplaying as a customer of the picked sample business; real customer messages arrive on WhatsApp / Instagram / Discord / email and never on this chat box.

Persona safety queue

  • Trigger: the admin persona handler (POST /dashboard/persona) computes sha256(active_prompt()) on save; if it differs from safety.checked_prompt_hash, it sets safety.status = Pending and sends a SafetyJob { tenant_id, prompt_hash } onto the SAFETY_QUEUE producer binding. Saves that don't change the active prompt skip enqueue.
  • Consumer: #[event(queue)] in src/lib.rs dispatches to safety_queue::handle_batch. Each job re-reads the persona, drops the job if the prompt hash has drifted (a newer save has already enqueued), runs safety::classify_persona against the fast model, and writes Approved or Rejected { vague_reason } back to KV with checked_prompt_hash and checked_at.
  • Classifier: system prompt enumerates Calculon Tech's content policy (no incitement, harassment, discrimination, sexualization of minors, self-harm, illegal-activity promotion, unconsented impersonation). The model returns strict JSON {"verdict":"approve"|"reject","category":"..."}. Categories are logged for abuse review but never echoed; the user-facing rejection text comes from a fixed mapping in safety::vague_reason_for so users can't iterate prompts against the classifier.
  • Failure mode: classifier or KV failures call message.retry(); the queue's DLQ policy (3 retries, then concierge-safety-dlq) takes over. While the persona stays Pending, AI replies are blocked but canned default rules still send.
  • Bindings: producer + consumer for concierge-safety, DLQ concierge-safety-dlq. Both queues must exist before deploy: see Deploy.

Localization

  • Locale model: every tenant carries a BCP-47 tag (Tenant.locale, e.g. en-IN, en-US) and an optional independent currency override. src/locale.rs::Locale bundles the two into a single value carried through templates and handlers, replacing the previous tangle of if currency == "INR" branches.
  • Resolution chain (first hit wins): tenant-stored locale → Accept-Language header (parsed via the accept-language crate, intersected with the supported set) → cf-ipcountry mapping (INen-IN, default en-US) → hardcoded en-IN. Set once at signup; admin-overrideable from /dashboard/settings/currency.
  • Number / currency formatting: helpers::format_count and helpers::format_money use icu::decimal::FixedDecimalFormatter (icu crate, compiled_data feature). en-IN renders 1,00,000 (lakh / crore grouping); en-US renders 100,000. INR shows whole rupees with the ₹ symbol; USD shows two decimals with $.
  • Translation: fluent-bundle with FTL files at assets/locales/{tag}/messages.ftl, baked in at build time via include_str!. src/i18n.rs exposes a OnceLock-backed Translator and t(locale, key) sugar. Lookup falls back to the canonical en-IN bundle, then to the literal key (so a missed key is loud in the rendered HTML and caught by template tests).
  • Adding a locale: drop a new FTL file under assets/locales/{tag}/, add the tag to Translator::new and Locale::from_request's match arms, and register it in locale::parse_supported. CLDR data for the new locale is shipped automatically via the compiled_data feature.
  • Out of scope: AI-generated reply content stays English. Per-language persona prompts and a classifier model that handles target languages well are deferred: see the persona safety queue notes above.

Reply buffer (Durable Object)

  • Class: ReplyBufferDO in src/durable_objects/reply_buffer.rs; binding REPLY_BUFFER.
  • Keying: one DO instance per {tenant_id}:{channel}:{sender} conversation.
  • Sliding window: each push appends to a pending list and resets the alarm to now + wait_seconds. Bursts collapse into one alarm fire.
  • Drop-after-send: the alarm handler clears DO storage before calling the LLM. Bodies live in DO state for ≤ wait_seconds (5s default), then gone.
  • Bypass: wait_seconds = 0 on the channel's AutoReplyConfig skips the buffer for instant replies.

Approval relay

  • Discord: AI drafts post to the tenant's approval channel as embeds with Approve/Reject buttons. Button click triggers /discord/interactions → component handler → outbound send via the originating channel adapter.
  • Conversation context: stored in KV at conv:{id} with 7-day TTL, holds the Discord message id and origin channel/sender so the reply routes back correctly.
  • Email: approval-by-email digest sent at the tenant's configured cadence (default 15 min); links contain signed tokens for one-click approve/reject.

Storage layout

D1 tables

  • tenants: id, email (UNIQUE), facebook_id, plan, currency.
  • messages: unified inbound/outbound metadata (channel, direction, sender, recipient, action_taken, conversation_id). No body content. Indexed on tenant+time, channel+tenant+time, channel_account_id, and conversation_id+time so a conversation thread can be reconstructed for audit. AI flows stamp conversation_id; canned-only flows leave it NULL.
  • whatsapp_messages, instagram_messages, email_messages, email_metrics: channel-specific logs.
  • tenant_billing: credit ledger as JSON (entries with optional expiry).
  • payments: Razorpay event log, kept for dispute and tax records.
  • audit_log: management-action history.
  • personas: curated archetype catalog (slug, label, description, source_json = serialized PersonaSource, greeting, is_system, safety_status, safety_checked_at, safety_vague_reason). Edited from /manage/personas; consumed by the public demo persona picker and snapshotted into a tenant's onboarding state when they pick one. Every edit resets safety_status to draft and enqueues a SafetyJob; only approved rows are visible to demo and onboarding. The concierge row is is_system=1 (undeletable) and powers the homepage demo.

KV keys

  • session:*, csrf:*: auth cookies (TTL 7d).
  • whatsapp:{id}, instagram:{id}: per-resource configs. Channel records embed their own ReplyConfig (rules + default rule + wait_seconds).
  • tenant:{tenant}:whatsapp:{id} etc.: per-tenant indexes (empty values; existence is the index).
  • wa_phone:*, ig_page:*, email_domain:*: webhook → tenant reverse indexes.
  • email_domains:{tenant}, email_rules:{tenant}:{domain}, email_reverse:*: email config + alias mapping.
  • discord_guild:{guild_id}, discord_config:{tenant}: guild ↔ tenant.
  • onboarding:{tenant}: wizard state. Holds the PersonaConfig (source variant + safety status), default_wait_seconds applied to newly connected channels, and the ConversationConfig knobs (idle_gap_mins, handoff_cooldown_mins, max_history_messages — each Option<u32>, falling back to prompt::DEFAULT_* when unset).
  • convsession:{tenant}:{channel}:{sender}: per-customer conversation session. Holds conversation_id, last_inbound_at, bounded messages list, and optional handoff state. Distinct from the auth session: prefix.
  • conv:{id}: approval-relay conversation context (TTL 7d). Holds the originating channel + sender for routing an approved/edited reply back, plus the conversation_id stamped at enqueue time so the post-approval send paths can stamp matching D1 rows.

Auth

  • Login: Google OAuth (/auth/callback) and Facebook Login (/auth/facebook/callback). Same tenant gets linked to both providers if their email matches.
  • Session: 7-day HttpOnly cookie; CSRF via double-submit cookie checked on every POST/PUT/DELETE under /dashboard.
  • Management panel: /manage/* protected by Cloudflare Access (verifies the Cf-Access-Jwt-Assertion header against the team's JWKS).

Outbound APIs Concierge calls

  • Meta Graph API for WhatsApp + Instagram + Facebook Login.
  • Discord REST API (discord.com/api/v10) for messages, channels, guild lookup.
  • Razorpay API for orders, subscriptions, payment verification.
  • Cloudflare Workers AI binding (no HTTP: direct binding call). Used for reply generation, prompt-injection scan, persona safety classification, and BGE embeddings.
  • Cloudflare Queues binding (SAFETY_QUEUE) for fanning persona safety jobs to the queue consumer.

Limits and known constraints

  • Discord DM auto-reply is unsupported with the shared bot: incoming DMs hit the events endpoint with no guild_id, so we can't attribute them to a tenant.
  • WhatsApp has no message-history API; the reply buffer relies on its own DO state to reconstruct bursts.
  • Cloudflare Email Service requires sender domains to be onboarded in the dashboard before sends from them succeed; new tenant subdomains may need manual onboarding until that step is automated.
  • No per-message body storage. If you need a conversation web-view, that's a future feature requiring a schema change and ToS update.