How it works.
Concierge runs as a single Cloudflare Worker that handles four messaging channels through one unified pipeline. Every inbound event (a WhatsApp message, an Instagram DM, an email arriving at your catch‑all, or a Discord interaction) is normalized into the same shape and processed by the same steps.
The unified pipeline
Regardless of which channel a message arrives on, it’s normalized into an InboundMessage struct (channel, sender, recipient, tenant, metadata), and processed through the same steps. This is the spine of the codebase. Channel handlers exist only to translate webhooks into InboundMessage and to dispatch outbound replies.
-
Channel handler receives the raw event
WhatsApp and Instagram POST signed webhook payloads to
/webhook/*. Email arrives via Cloudflare’ssend_to_workeraction, invoking the Worker’semail()entrypoint. -
Normalize into
InboundMessageChannel, sender, recipient, tenant ID, and any channel‑specific metadata. From here, every code path is identical.
-
Log metadata to
messagesAn append‑only row in D1: channel, direction, sender ID, recipient ID, tenant, timestamp. Message content is never persisted: only the fact that something happened, with whom.
-
Reply rules evaluate in order
Each channel carries an ordered
ReplyConfig. Rules pair a matcher (case‑insensitive keyword substring or BGE‑embedding cosine similarity over a user‑written intent description) with a response (canned text or an AI prompt). First match wins; the mandatory default rule fires if nothing else does. -
Resolve the conversation session
Per‑customer threads are tracked in KV at
convsession:{tenant}:{channel}:{sender}. If the customer has been silent for longer thanidle_gap_mins(default 6 h), a fresh conversation begins: a newconversation_idis minted, message history clears, any handoff state is wiped. Otherwise the existing session is loaded and the inbound is appended to its messages list. -
Build the prompt envelope
The system prompt sent to the reply model is always a sandwich: a fixed preamble framing the model as a small‑business reply assistant, the editable middle (the tenant’s persona prompt + the rule prompt + the persona’s goal & handoff conditions), and a fixed postamble with universal house rules, jailbreak rails, and handoff triggers. The preamble and postamble are constants in
src/prompt.rs; tenant content can never reach the model alone. -
Action dispatches · canned text or multi‑turn LLM call
Canned responses send verbatim, no credit charge. Prompt responses run the reply model with the envelope above as the system message and the session’s recent message history (capped by
max_history_messages, default 20) as the prior turns. One credit is deducted before the call (optimistic) and restored if generation or send fails. AI replies are blocked unless the persona’s asynchronous safety check has approved the current prompt. -
Handoff scan
Every model reply is scanned for the
[[HANDOFF]]sentinel before it leaves the worker. If present, the token is stripped, the session is flipped into the holding‑pattern path forhandoff_cooldown_mins(default 60 min), and the tenant is paged once via their configured approval channels (Discord embed and/or immediate email). Past the cooldown the worker stays silent until the idle gap resets the conversation.
The unified messages table stores only metadata: channel, direction, sender ID, recipient ID, tenant, timestamp. No subjects, no bodies, no attachments. AI replies are generated synchronously from in-memory data and discarded.
WhatsApp / Instagram auto‑reply
Both Meta channels run through the same reply pipeline:
- Meta delivers the inbound message to
POST /webhook/whatsapporPOST /webhook/instagram. - Concierge looks up the channel account (phone number ID for WhatsApp, page ID for Instagram) and its
ReplyConfig. - The body is truncated to 1000 chars and run past a fast prompt‑injection scanner; injection attempts are dropped.
- If any rule has a Prompt matcher, the inbound message is embedded once and compared via cosine similarity to each rule’s precomputed embedding.
- Rules are walked in order; the first match wins. Otherwise the mandatory default rule fires.
- Canned responses send verbatim with no credit charge. Prompt responses combine persona + rule prompt + a context block, deduct one credit, and run the main LLM. AI replies require the tenant’s persona to be safety‑Approved.
Persona & safety check
- The tenant fills in the persona builder at
/dashboard/persona: a voice archetype (Friendly / Professional / Playful / Formal), business name and type, a goal (what to drive customers toward, free text + optional URL), catch‑phrases, off‑topic boundaries, and handoff conditions (what to escalate to a human). Alternatively the tenant copies a curated archetype from the platform’s D1 catalog (/manage/personas) or writes a raw custom prompt. - On save, the rendered active prompt is hashed; if the hash differs from the last‑vetted hash, status flips to Pending and a
SafetyJobis enqueued onto Cloudflare Queueconcierge-safety. - The queue consumer reads the job, re‑checks the hash (drops stale jobs), and runs the safety classifier with Calculon Tech’s content policy.
- The result lands back in KV as Approved or Rejected with a vague user‑facing reason.
- While Pending or Rejected, AI replies are blocked tenant‑wide; canned default replies still send.
Conversations & handoff
Inbound messages don’t live in a vacuum. Each customer×channel pair has a conversation session in KV (convsession:{tenant}:{channel}:{sender}) holding a stable conversation_id, a bounded list of recent (role, content) turns, the timestamp of the last inbound, and an optional handoff sub‑state. The session is also stamped onto matching rows in the unified messages D1 table via a conversation_id column so a thread can be reconstructed for audit.
- Idle gap. If the customer has been silent for longer than
idle_gap_mins(default 6 h), the next inbound starts a fresh conversation: history clears, handoff is wiped, the persona replies normally. Six hours is long enough for a customer mulling a quote over lunch, short enough that next morning’s message is genuinely fresh. - Multi‑turn context. Up to
max_history_messagesrecent turns (default 20, roughly ten back‑and‑forths) are passed to the reply model as the prior chat history on every AI call. Queued draft replies are not appended to history — a draft that gets rejected can’t poison the next AI call. - Handoff signal. The postamble defines the universal triggers (the model doesn’t understand the request; the customer asks for a person; the message touches medical / legal / financial / safety territory) and the persona’s handoff conditions add tenant‑specific ones. To escalate, the model writes one polite holding sentence and ends the reply with
[[HANDOFF]]on its own line. - Pipeline reaction. The token is stripped before the customer sees the reply. The session flips into the holding pattern, replacing the persona middle with a calm "a person has been notified" voice for any follow‑up turns within
handoff_cooldown_mins(default 60 min). Past the cooldown the worker stays silent until the idle gap resets the conversation. - Tenant page. The tenant is paged once per handoff via their configured approval‑notification channels — Discord embed (via the bot) and/or immediate email through
send_outbound. This is not the digest cron: the page goes out the moment the handoff fires.
Operators tune all three knobs (idle_gap_mins, handoff_cooldown_mins, max_history_messages) per‑tenant from the Conversation Timing card on /dashboard/settings. The form enforces sensible bounds (5–1440 min, 1–200 turns) and a cross‑field invariant: idle gap must be longer than handoff cooldown, otherwise an active handoff could be wiped before its cooldown ended. Empty fields fall back to the in‑code defaults.
Live demo chat
The public welcome page hosts a live demo modal so visitors can see the AI in action without signing up. It’s framed explicitly: visitors roleplay as a customer of a sample business (florist, salon, cafe, …) and watch the AI reply in real time. Real customer messages still arrive on WhatsApp, Instagram, email, or Discord — the demo chat box is never a customer surface.
GET /demo/personasreturns the safety‑Approved archetypes from the D1 personas catalog, with sample business fields (name, type, city, goal) for the picker.- The chat‑input form posts to
POST /demo/chatwith the picked persona slug, the visitor’s message, the prior turns kept client‑side, and a stateless handoff flag. - The handler runs the message through the same prompt envelope and reply model the production pipeline uses, gates on the same safety classifier, and respects the same handoff sentinel. The "View prompt" panel reveals the exact envelope being sent so visitors can see what the model receives.
- Handoff round‑trips statelessly via
{handoff: bool}on the wire and an Alpine flag in the modal; persona switch and modal close reset it.
Email routing
- An email arrives at your catch‑all domain (configured via Cloudflare Email Routing).
- Cloudflare triggers the Worker’s
email()handler. - Concierge extracts the domain, looks up the tenant, and parses the MIME message.
- Routing rules are evaluated in priority order using glob‑pattern matching on
from,to,subject,body, andhas_attachment. - The matched rule’s action executes: drop, spam reject, forward email, forward to Discord, or AI reply with approval.
- For email forwarding, a reverse‑alias address is generated so replies route back through Concierge to the original sender.
*- any sequence of characters, zero or more
?- exactly one character
- case
- matching is case‑insensitive
- combine
- all non‑None criteria are AND‑ed (from + to + subject + body + has_attachment)
- order
- rules are sorted by ascending priority; the highest priority match wins
Discord relay
- When a message from any channel is forwarded to Discord (via email routing rules or future direct integrations), it arrives as an embed with Reply, Approve, and Drop buttons.
- A
ConversationContextis saved in KV, linking the Discord message to the original channel, sender, and reply metadata. - When someone clicks Reply, a modal opens for composing a response.
- The reply is sent back through the originating channel using the stored context.
- For AI‑generated drafts, Approve sends the draft and Drop discards it.
Billing
Each AI‑mode reply (rule with a Prompt response) deducts one credit from the tenant’s balance. Canned replies, embedding lookups, intent classification, and persona safety checks are free. Credits are deducted before the AI call (optimistic deduction) and restored if generation or send fails. When credits reach zero, AI replies stop; canned defaults still send. Credits can be granted by management or purchased via Razorpay.
Platform model
| Channel | Model | Token storage |
|---|---|---|
| Shared WABA: you own one WABA, customers add numbers via Meta Embedded Signup | Single platform token WHATSAPP_ACCESS_TOKEN |
|
| Per-account OAuth: Facebook Login, page tokens per customer | Encrypted in KV, rotated daily by cron | |
| Per-domain: each tenant registers domains and creates rules | No tokens; Cloudflare Email Routing dispatches | |
| Discord | Guild → tenant: each Discord server is linked to one tenant | Shared bot token (DISCORD_BOT_TOKEN env secret) |
Architecture
- Cloudflare Worker: Rust compiled to WebAssembly. Handles all HTTP routes, webhooks, and email events.
- Cloudflare KV: tenant configs, account configs, tokens, sessions, routing rules, billing state, conversation contexts, persona.
- Cloudflare D1: SQLite for message metadata, email metrics, credit packs, payments, audit logs.
- Cloudflare Workers AI: reply generation, prompt‑injection scanning, persona safety classification, BGE embeddings.
- Cloudflare Queues: persona safety classifier (
concierge-safety+concierge-safety-dlq). - Cloudflare Email Routing: triggers the Worker’s email handler for inbound emails.
- Discord Interactions API: slash commands, button interactions, modal submissions via
POST /discord/interactions. - Razorpay: payment processing for credit pack purchases.
/// The normalized form every channel produces. Channel handlers
/// exist only to translate webhooks into this struct.
pub struct InboundMessage {
pub id: String,
pub channel: Channel, // WhatsApp | Instagram | Email | Discord
pub sender: String,
pub sender_name: Option<String>,
pub recipient: String,
pub body: String, // in‑memory only, never persisted
pub tenant_id: String,
pub channel_account_id: String,
pub raw_metadata: Value,
}