C Concierge Documentation
Docs / Start here / How it works
Architecture

How it works.

Concierge runs as a single Cloudflare Worker that handles four messaging channels through one unified pipeline. Every inbound event (a WhatsApp message, an Instagram DM, an email arriving at your catch‑all, or a Discord interaction) is normalized into the same shape and processed by the same steps.

The pipeline · runs identically for every channel
Inbound
WhatsAppwebhook
Instagramwebhook
Emailemail() handler
Normalize
Common pipeline
01NormalizeInboundMessage
02Tenant + credit check
03Reply rules · keyword + embedding
04Envelope + persona + history
05Action dispatch · handoff check
06Log metadata · D1
Dispatch
Actions
Canned replyStatic text · no AI · free
AI replyWorkers AI · llama‑4‑scout · persona + rule prompt
Forward → DiscordEmbed + Reply / Approve / Drop
Forward emailReverse‑alias for replies
Drop / spam rejectSilent or NDR

The unified pipeline

Regardless of which channel a message arrives on, it’s normalized into an InboundMessage struct (channel, sender, recipient, tenant, metadata), and processed through the same steps. This is the spine of the codebase. Channel handlers exist only to translate webhooks into InboundMessage and to dispatch outbound replies.

  1. Channel handler receives the raw event

    WhatsApp and Instagram POST signed webhook payloads to /webhook/*. Email arrives via Cloudflare’s send_to_worker action, invoking the Worker’s email() entrypoint.

  2. Normalize into InboundMessage

    Channel, sender, recipient, tenant ID, and any channel‑specific metadata. From here, every code path is identical.

  3. Log metadata to messages

    An append‑only row in D1: channel, direction, sender ID, recipient ID, tenant, timestamp. Message content is never persisted: only the fact that something happened, with whom.

  4. Reply rules evaluate in order

    Each channel carries an ordered ReplyConfig. Rules pair a matcher (case‑insensitive keyword substring or BGE‑embedding cosine similarity over a user‑written intent description) with a response (canned text or an AI prompt). First match wins; the mandatory default rule fires if nothing else does.

  5. Resolve the conversation session

    Per‑customer threads are tracked in KV at convsession:{tenant}:{channel}:{sender}. If the customer has been silent for longer than idle_gap_mins (default 6 h), a fresh conversation begins: a new conversation_id is minted, message history clears, any handoff state is wiped. Otherwise the existing session is loaded and the inbound is appended to its messages list.

  6. Build the prompt envelope

    The system prompt sent to the reply model is always a sandwich: a fixed preamble framing the model as a small‑business reply assistant, the editable middle (the tenant’s persona prompt + the rule prompt + the persona’s goal & handoff conditions), and a fixed postamble with universal house rules, jailbreak rails, and handoff triggers. The preamble and postamble are constants in src/prompt.rs; tenant content can never reach the model alone.

  7. Action dispatches · canned text or multi‑turn LLM call

    Canned responses send verbatim, no credit charge. Prompt responses run the reply model with the envelope above as the system message and the session’s recent message history (capped by max_history_messages, default 20) as the prior turns. One credit is deducted before the call (optimistic) and restored if generation or send fails. AI replies are blocked unless the persona’s asynchronous safety check has approved the current prompt.

  8. Handoff scan

    Every model reply is scanned for the [[HANDOFF]] sentinel before it leaves the worker. If present, the token is stripped, the session is flipped into the holding‑pattern path for handoff_cooldown_mins (default 60 min), and the tenant is paged once via their configured approval channels (Discord embed and/or immediate email). Past the cooldown the worker stays silent until the idle gap resets the conversation.

i
Storage note

The unified messages table stores only metadata: channel, direction, sender ID, recipient ID, tenant, timestamp. No subjects, no bodies, no attachments. AI replies are generated synchronously from in-memory data and discarded.

WhatsApp / Instagram auto‑reply

Both Meta channels run through the same reply pipeline:

  1. Meta delivers the inbound message to POST /webhook/whatsapp or POST /webhook/instagram.
  2. Concierge looks up the channel account (phone number ID for WhatsApp, page ID for Instagram) and its ReplyConfig.
  3. The body is truncated to 1000 chars and run past a fast prompt‑injection scanner; injection attempts are dropped.
  4. If any rule has a Prompt matcher, the inbound message is embedded once and compared via cosine similarity to each rule’s precomputed embedding.
  5. Rules are walked in order; the first match wins. Otherwise the mandatory default rule fires.
  6. Canned responses send verbatim with no credit charge. Prompt responses combine persona + rule prompt + a context block, deduct one credit, and run the main LLM. AI replies require the tenant’s persona to be safety‑Approved.

Persona & safety check

  1. The tenant fills in the persona builder at /dashboard/persona: a voice archetype (Friendly / Professional / Playful / Formal), business name and type, a goal (what to drive customers toward, free text + optional URL), catch‑phrases, off‑topic boundaries, and handoff conditions (what to escalate to a human). Alternatively the tenant copies a curated archetype from the platform’s D1 catalog (/manage/personas) or writes a raw custom prompt.
  2. On save, the rendered active prompt is hashed; if the hash differs from the last‑vetted hash, status flips to Pending and a SafetyJob is enqueued onto Cloudflare Queue concierge-safety.
  3. The queue consumer reads the job, re‑checks the hash (drops stale jobs), and runs the safety classifier with Calculon Tech’s content policy.
  4. The result lands back in KV as Approved or Rejected with a vague user‑facing reason.
  5. While Pending or Rejected, AI replies are blocked tenant‑wide; canned default replies still send.

Conversations & handoff

Inbound messages don’t live in a vacuum. Each customer×channel pair has a conversation session in KV (convsession:{tenant}:{channel}:{sender}) holding a stable conversation_id, a bounded list of recent (role, content) turns, the timestamp of the last inbound, and an optional handoff sub‑state. The session is also stamped onto matching rows in the unified messages D1 table via a conversation_id column so a thread can be reconstructed for audit.

  1. Idle gap. If the customer has been silent for longer than idle_gap_mins (default 6 h), the next inbound starts a fresh conversation: history clears, handoff is wiped, the persona replies normally. Six hours is long enough for a customer mulling a quote over lunch, short enough that next morning’s message is genuinely fresh.
  2. Multi‑turn context. Up to max_history_messages recent turns (default 20, roughly ten back‑and‑forths) are passed to the reply model as the prior chat history on every AI call. Queued draft replies are not appended to history — a draft that gets rejected can’t poison the next AI call.
  3. Handoff signal. The postamble defines the universal triggers (the model doesn’t understand the request; the customer asks for a person; the message touches medical / legal / financial / safety territory) and the persona’s handoff conditions add tenant‑specific ones. To escalate, the model writes one polite holding sentence and ends the reply with [[HANDOFF]] on its own line.
  4. Pipeline reaction. The token is stripped before the customer sees the reply. The session flips into the holding pattern, replacing the persona middle with a calm "a person has been notified" voice for any follow‑up turns within handoff_cooldown_mins (default 60 min). Past the cooldown the worker stays silent until the idle gap resets the conversation.
  5. Tenant page. The tenant is paged once per handoff via their configured approval‑notification channels — Discord embed (via the bot) and/or immediate email through send_outbound. This is not the digest cron: the page goes out the moment the handoff fires.

Operators tune all three knobs (idle_gap_mins, handoff_cooldown_mins, max_history_messages) per‑tenant from the Conversation Timing card on /dashboard/settings. The form enforces sensible bounds (5–1440 min, 1–200 turns) and a cross‑field invariant: idle gap must be longer than handoff cooldown, otherwise an active handoff could be wiped before its cooldown ended. Empty fields fall back to the in‑code defaults.

Live demo chat

The public welcome page hosts a live demo modal so visitors can see the AI in action without signing up. It’s framed explicitly: visitors roleplay as a customer of a sample business (florist, salon, cafe, …) and watch the AI reply in real time. Real customer messages still arrive on WhatsApp, Instagram, email, or Discord — the demo chat box is never a customer surface.

  1. GET /demo/personas returns the safety‑Approved archetypes from the D1 personas catalog, with sample business fields (name, type, city, goal) for the picker.
  2. The chat‑input form posts to POST /demo/chat with the picked persona slug, the visitor’s message, the prior turns kept client‑side, and a stateless handoff flag.
  3. The handler runs the message through the same prompt envelope and reply model the production pipeline uses, gates on the same safety classifier, and respects the same handoff sentinel. The "View prompt" panel reveals the exact envelope being sent so visitors can see what the model receives.
  4. Handoff round‑trips statelessly via {handoff: bool} on the wire and an Alpine flag in the modal; persona switch and modal close reset it.

Email routing

  1. An email arrives at your catch‑all domain (configured via Cloudflare Email Routing).
  2. Cloudflare triggers the Worker’s email() handler.
  3. Concierge extracts the domain, looks up the tenant, and parses the MIME message.
  4. Routing rules are evaluated in priority order using glob‑pattern matching on from, to, subject, body, and has_attachment.
  5. The matched rule’s action executes: drop, spam reject, forward email, forward to Discord, or AI reply with approval.
  6. For email forwarding, a reverse‑alias address is generated so replies route back through Concierge to the original sender.
Glob semantics  last match wins
*
any sequence of characters, zero or more
?
exactly one character
case
matching is case‑insensitive
combine
all non‑None criteria are AND‑ed (from + to + subject + body + has_attachment)
order
rules are sorted by ascending priority; the highest priority match wins

Discord relay

  1. When a message from any channel is forwarded to Discord (via email routing rules or future direct integrations), it arrives as an embed with Reply, Approve, and Drop buttons.
  2. A ConversationContext is saved in KV, linking the Discord message to the original channel, sender, and reply metadata.
  3. When someone clicks Reply, a modal opens for composing a response.
  4. The reply is sent back through the originating channel using the stored context.
  5. For AI‑generated drafts, Approve sends the draft and Drop discards it.

Billing

Each AI‑mode reply (rule with a Prompt response) deducts one credit from the tenant’s balance. Canned replies, embedding lookups, intent classification, and persona safety checks are free. Credits are deducted before the AI call (optimistic deduction) and restored if generation or send fails. When credits reach zero, AI replies stop; canned defaults still send. Credits can be granted by management or purchased via Razorpay.

Platform model

Per-channel architectureHow each channel attaches to a tenant
ChannelModelToken storage
WhatsApp Shared WABA: you own one WABA, customers add numbers via Meta Embedded Signup Single platform token WHATSAPP_ACCESS_TOKEN
Instagram Per-account OAuth: Facebook Login, page tokens per customer Encrypted in KV, rotated daily by cron
Email Per-domain: each tenant registers domains and creates rules No tokens; Cloudflare Email Routing dispatches
Discord Guild → tenant: each Discord server is linked to one tenant Shared bot token (DISCORD_BOT_TOKEN env secret)

Architecture

  • Cloudflare Worker: Rust compiled to WebAssembly. Handles all HTTP routes, webhooks, and email events.
  • Cloudflare KV: tenant configs, account configs, tokens, sessions, routing rules, billing state, conversation contexts, persona.
  • Cloudflare D1: SQLite for message metadata, email metrics, credit packs, payments, audit logs.
  • Cloudflare Workers AI: reply generation, prompt‑injection scanning, persona safety classification, BGE embeddings.
  • Cloudflare Queues: persona safety classifier (concierge-safety + concierge-safety-dlq).
  • Cloudflare Email Routing: triggers the Worker’s email handler for inbound emails.
  • Discord Interactions API: slash commands, button interactions, modal submissions via POST /discord/interactions.
  • Razorpay: payment processing for credit pack purchases.
rust src/types.rs
/// The normalized form every channel produces. Channel handlers
/// exist only to translate webhooks into this struct.
pub struct InboundMessage {
    pub id:                 String,
    pub channel:            Channel,        // WhatsApp | Instagram | Email | Discord
    pub sender:             String,
    pub sender_name:        Option<String>,
    pub recipient:          String,
    pub body:               String,         // in‑memory only, never persisted
    pub tenant_id:          String,
    pub channel_account_id: String,
    pub raw_metadata:       Value,
}