Architecture

Concierge is a Cloudflare Worker (Rust → WebAssembly). All persistent state lives in Cloudflare D1 (metadata, payments) and KV (configs, sessions, in-flight buffers). No message content is stored at rest.

Inbound channels

WhatsApp Business

Transport: Meta Cloud API webhooks at POST /webhook/whatsapp.
Auth: WhatsApp Embedded Signup OAuth → tenant exchanges code for system-user token, scoped to one phone number ID.
Tenant lookup: reverse index wa_phone:{phone_number_id} → WhatsApp account id → tenant id.
Outbound: Meta Graph API POST /{phone_number_id}/messages with the system-user token.
Limits: no message-history endpoint, so post-hoc batching can't reconstruct text from older messages: see Reply Buffer below.

Instagram DMs

Transport: Meta webhook events, same handler family as WhatsApp.
Auth: Facebook Login → finds the user's Pages → finds the IG business account on each page → stores per-page access token (AES-256-GCM encrypted with ENCRYPTION_KEY).
Tenant lookup: ig_page:{page_id} reverse index → IG account → tenant.
Outbound: Graph API POST /me/messages with the page token.

Discord

Install: OAuth2 scope=bot+applications.commands, permission bitfield 76928 (SEND_MESSAGES | VIEW_CHANNEL | READ_MESSAGE_HISTORY | ADD_REACTIONS | MANAGE_MESSAGES). Callback at /auth/discord/callback records guild_id → tenant_id in KV.
Inbound transport: Application Webhook Events at POST /discord/events. MESSAGE_CREATE events drive AI auto-reply.
Triggers: per-tenant flags on DiscordConfig: inbound_mentions (reply when @-mentioned), inbound_channel_ids[] (reply to every message in these channels). DMs unsupported with the shared bot.
Interactions: POST /discord/interactions handles slash commands (/status, /domains list, /rules list) and buttons (Reply, Approve, Reject, Drop).
Signature verification: Ed25519 over timestamp + body using DISCORD_PUBLIC_KEY; same scheme for both endpoints.
Outbound: shared bot token (DISCORD_BOT_TOKEN env secret), POST to /channels/{id}/messages via the botrelay crate.

Email

Transport: Cloudflare Email Routing. Every *.cncg.email subdomain gets MX records pointed at the worker. Inbound mail invokes the worker's email event handler with the raw RFC 2822 bytes.
Tenant lookup: email_domain:{domain} KV reverse index.
Routing rules: per-domain ordered list in KV at email_rules:{tenant}:{domain}. Each rule has MatchCriteria (from, to, subject, body globs + has_attachment) and an EmailAction (drop, spam, forward_email, forward_discord, ai_reply).
Outbound: Cloudflare Email Service via the EMAIL binding's structured-message API. Sender domain must be onboarded in the Email Service dashboard.
Reverse aliases: when forwarding, the From header is rewritten to a generated address on the tenant's domain so replies route back through Concierge. Mapping stored in email_reverse:* with 30-day TTL.
Loop detection: outbound messages carry X-EmailProxy-Forwarded; inbound messages with that header are rejected.

AI reply pipeline

Inference binding: Cloudflare Workers AI AI binding. Default models: llama-4-scout-17b-16e-instruct for replies, llama-3.1-8b-instruct-fast for prompt-injection scanning and persona safety classification, @cf/baai/bge-base-en-v1.5 for embeddings. Reply and fast models are configurable via AI_MODEL / AI_FAST_MODEL env vars; the embedding model id is centralized in ai::EMBEDDING_MODEL.
Persona prompt: tenant-wide. Lives in PersonaConfig.source as one of three variants: Preset(PersonaPreset), Builder(PersonaBuilder), or Custom(String): never a mix. PersonaConfig::active_prompt() resolves the chosen variant on demand (preset constant, generated from builder fields, or the raw custom string).
Reply rules: per-channel ReplyConfig { enabled, rules: Vec<ReplyRule>, default_rule, wait_seconds }. The pipeline walks rules in order; first match wins; otherwise the mandatory default_rule fires. Each rule has a matcher (StaticText { keywords } for case-insensitive substring or Prompt { description, embedding, threshold } for cosine-similarity intent matching) and a response (Canned { text } sent verbatim, or Prompt { text } appended to the persona prompt and run through the LLM).
Embedding step: if any Prompt rule exists, the inbound message is embedded once per delivery and compared via ai::cosine to each rule's pre-computed embedding (computed at rule-save time, stored in the rule alongside the model id). Default threshold is 0.72; tunable per rule.
Persona safety gate: AI replies (ReplyResponse::Prompt) are blocked unless the tenant's persona is Approved and its hash hasn't drifted since the last vetting. Canned responses are unaffected. See "Persona safety queue" below.
Final prompt: the system prompt sent to the reply model is persona.active_prompt() + "\n\n" + rule_prompt. The user message wraps the inbound text and sender name as a "Context: ... Generate an appropriate response." block.
Injection scan: incoming bodies are truncated to 1000 chars, then a fast classifier checks for instruction-override patterns. Rejected messages skip the entire pipeline (no rule matching, no reply).
Billing: only the AI reply step deducts a credit; static Canned responses are free, and embeddings/intent matching/safety classification are free. Deduction happens before the AI call (optimistic), restored on any failure path. Free monthly grant of 100 credits per tenant.
Pricing: flat $0.02 / ₹2 per AI reply, no tiers. UNIT_PRICE_PAISE = 200, UNIT_PRICE_CENTS = 2 in src/billing/mod.rs.

Persona safety queue

Trigger: the admin persona handler (POST /admin/persona) computes sha256(active_prompt()) on save; if it differs from safety.checked_prompt_hash, it sets safety.status = Pending and sends a SafetyJob { tenant_id, prompt_hash } onto the SAFETY_QUEUE producer binding. Saves that don't change the active prompt skip enqueue.
Consumer: #[event(queue)] in src/lib.rs dispatches to safety_queue::handle_batch. Each job re-reads the persona, drops the job if the prompt hash has drifted (a newer save has already enqueued), runs safety::classify_persona against the fast model, and writes Approved or Rejected { vague_reason } back to KV with checked_prompt_hash and checked_at.
Classifier: system prompt enumerates Calculon Tech's content policy (no incitement, harassment, discrimination, sexualization of minors, self-harm, illegal-activity promotion, unconsented impersonation). The model returns strict JSON {"verdict":"approve"|"reject","category":"..."}. Categories are logged for abuse review but never echoed; the user-facing rejection text comes from a fixed mapping in safety::vague_reason_for so users can't iterate prompts against the classifier.
Failure mode: classifier or KV failures call message.retry(); the queue's DLQ policy (3 retries, then concierge-safety-dlq) takes over. While the persona stays Pending, AI replies are blocked but canned default rules still send.
Bindings: producer + consumer for concierge-safety, DLQ concierge-safety-dlq. Both queues must exist before deploy: see Deploy.

Localization

Locale model: every tenant carries a BCP-47 tag (Tenant.locale, e.g. en-IN, en-US) and an optional independent currency override. src/locale.rs::Locale bundles the two into a single value carried through templates and handlers, replacing the previous tangle of if currency == "INR" branches.
Resolution chain (first hit wins): tenant-stored locale → Accept-Language header (parsed via the accept-language crate, intersected with the supported set) → cf-ipcountry mapping (IN → en-IN, default en-US) → hardcoded en-IN. Set once at signup; admin-overrideable from /admin/settings/currency.
Number / currency formatting: helpers::format_count and helpers::format_money use icu::decimal::FixedDecimalFormatter (icu crate, compiled_data feature). en-IN renders 1,00,000 (lakh / crore grouping); en-US renders 100,000. INR shows whole rupees with the ₹ symbol; USD shows two decimals with $.
Translation: fluent-bundle with FTL files at assets/locales/{tag}/messages.ftl, baked in at build time via include_str!. src/i18n.rs exposes a OnceLock-backed Translator and t(locale, key) sugar. Lookup falls back to the canonical en-IN bundle, then to the literal key (so a missed key is loud in the rendered HTML and caught by template tests).
Adding a locale: drop a new FTL file under assets/locales/{tag}/, add the tag to Translator::new and Locale::from_request's match arms, and register it in locale::parse_supported. CLDR data for the new locale is shipped automatically via the compiled_data feature.
Out of scope: AI-generated reply content stays English. Per-language persona prompts and a classifier model that handles target languages well are deferred: see the persona safety queue notes above.

Reply buffer (Durable Object)

Class: ReplyBufferDO in src/durable_objects/reply_buffer.rs; binding REPLY_BUFFER.
Keying: one DO instance per {tenant_id}:{channel}:{sender} conversation.
Sliding window: each push appends to a pending list and resets the alarm to now + wait_seconds. Bursts collapse into one alarm fire.
Drop-after-send: the alarm handler clears DO storage before calling the LLM. Bodies live in DO state for ≤ wait_seconds (5s default), then gone.
Bypass: wait_seconds = 0 on the channel's AutoReplyConfig skips the buffer for instant replies.

Approval relay

Discord: AI drafts post to the tenant's approval channel as embeds with Approve/Reject buttons. Button click triggers /discord/interactions → component handler → outbound send via the originating channel adapter.
Conversation context: stored in KV at conv:{id} with 7-day TTL, holds the Discord message id and origin channel/sender so the reply routes back correctly.
Email: approval-by-email digest sent at the tenant's configured cadence (default 15 min); links contain signed tokens for one-click approve/reject.

Lead capture forms

Storage: LeadCaptureForm in KV at lead_form:{id}, indexed by tenant.
Rendering: GET /lead/{id}/{slug} serves an iframe-friendly HTML form; CSP and allowed_origins restrict where it embeds.
Submission: POST to the same path validates the phone number and triggers a WhatsApp message via the configured account, then logs to lead_form_submissions in D1.

Storage layout

D1 tables

tenants: id, email (UNIQUE), facebook_id, plan, currency.
messages: unified inbound/outbound metadata (channel, direction, sender, recipient, action_taken). No body content.
whatsapp_messages, instagram_messages, email_messages, email_metrics, lead_form_submissions: channel-specific logs.
tenant_billing: credit ledger as JSON (entries with optional expiry).
payments: Razorpay event log for compliance.
audit_log: management-action history.

KV keys

session:*, csrf:*: auth cookies (TTL 7d).
whatsapp:{id}, instagram:{id}, lead_form:{id}: per-resource configs. Channel records embed their own ReplyConfig (rules + default rule + wait_seconds).
tenant:{tenant}:whatsapp:{id} etc.: per-tenant indexes (empty values; existence is the index).
wa_phone:*, ig_page:*, email_domain:*: webhook → tenant reverse indexes.
email_domains:{tenant}, email_rules:{tenant}:{domain}, email_reverse:*: email config + alias mapping.
discord_guild:{guild_id}, discord_config:{tenant}: guild ↔ tenant.
onboarding:{tenant}: wizard state. Holds the PersonaConfig (source variant + safety status) and default_wait_seconds applied to newly connected channels.
conv:{id}: approval-relay conversation context (TTL 7d).

Auth

Login: Google OAuth (/auth/callback) and Facebook Login (/auth/facebook/callback). Same tenant gets linked to both providers if their email matches.
Session: 7-day HttpOnly cookie; CSRF via double-submit cookie checked on every POST/PUT/DELETE under /admin.
Management panel: /manage/* protected by Cloudflare Access (verifies the Cf-Access-Jwt-Assertion header against the team's JWKS).

Outbound APIs Concierge calls

Meta Graph API for WhatsApp + Instagram + Facebook Login.
Discord REST API (discord.com/api/v10) for messages, channels, guild lookup.
Razorpay API for orders, subscriptions, payment verification.
Cloudflare Workers AI binding (no HTTP: direct binding call). Used for reply generation, prompt-injection scan, persona safety classification, and BGE embeddings.
Cloudflare Queues binding (SAFETY_QUEUE) for fanning persona safety jobs to the queue consumer.

Limits and known constraints

Discord DM auto-reply is unsupported with the shared bot: incoming DMs hit the events endpoint with no guild_id, so we can't attribute them to a tenant.
WhatsApp has no message-history API; the reply buffer relies on its own DO state to reconstruct bursts.
Cloudflare Email Service requires sender domains to be onboarded in the dashboard before sends from them succeed; new tenant subdomains may need manual onboarding until that step is automated.
No per-message body storage. If you need a conversation web-view, that's a future feature requiring a schema change and ToS update.