Message Generation
After guideline matching and tool calling complete, the engine has accumulated context: matched guidelines, tool results, glossary terms, and journey state. Now it's time to generate the actual response. This page explains how Parlant's message composer turns accumulated context into guideline-compliant messages.
The Composition Challengeβ
The Problem: LLMs Forget Instructionsβ
Even with focused context (achieved through guideline matching), LLMs can still fail to follow instructions mid-generation. The model may begin correctly but gradually drift from the specified guidelines:
Instructions: "Always mention the 30-day return policy when discussing returns."
Customer: "Can I return this shirt?"
LLM output: "Yes, you can return it! Just bring it back to any store location
with your receipt. We're happy to process exchanges too if you'd prefer a
different size or color..."
β Never mentioned the 30-day policy
Chain-of-Thought reasoning provides some improvement but proves insufficient. The LLM may reason correctly during the "thinking" phase but still fail to apply that reasoning during generation.
The Goalβ
Generate responses that:
- Follow all matched guidelines reliably
- Sound natural and conversational
- Prioritize what the customer needs to hear first
- Handle blocked tools gracefully
ARQ-Based Enforcementβ
Parlant uses Attentive Reasoning Queries (ARQs) for reliable guideline adherence. Unlike Chain-of-Thought, ARQs structure the entire output to reinstate critical instructions immediately before decisions.
How ARQs Workβ
ALGORITHM: ARQ Message Generation
INPUT: matched_guidelines, context, tool_results, tool_insights
OUTPUT: message
1. STRUCTURE the prompt:
- System: Agent identity, communication style
- Context: Variables, glossary terms, capabilities
- History: Conversation so far
- Tool Results: What tools returned
- Tool Insights: What tools couldn't run and why
- Guidelines: Matched guidelines with criticality levels
2. BUILD ARQ completion schema:
FOR each guideline (ordered by criticality):
- Field to restate guideline
- Field to reason about applicability
- Field to note how to address it
- Final field: The actual message
3. GENERATE structured output:
The LLM fills in each field sequentially.
By the time it writes the message, it has just
reasoned about every guideline.
4. EXTRACT message from structured output
5. RETURN message
ARQ Schema Structureβ
{
"guidelines": [
{
"guideline_id": "g1",
"guideline_content": "Always mention 30-day return policy",
"how_to_address": "Include policy mention when explaining return process",
"addressed_in_response": true
},
{
"guideline_id": "g2",
"guideline_content": "Offer to check order status",
"how_to_address": "Ask if they want me to look up their order",
"addressed_in_response": true
}
],
"message": "Of course! You can return items within 30 days of purchase for a full refund. Would you like me to look up your order to start the return process?"
}
The sequential structure ensures:
- Each guideline is explicitly restated (reinstated in context)
- The LLM reasons about how to address it (primes the generation)
- The message follows immediately after (leverages recency bias)
Criticality Affects Enforcementβ
| Criticality | ARQ Treatment |
|---|---|
| HIGH | Each guideline gets explicit acknowledgment field; validation may regenerate if not followed |
| MEDIUM | Grouped reasoning section; included in standard ARQ flow |
| LOW | Listed but not individually reasoned about |
High-criticality guidelines (compliance, safety) get the strongest enforcement.
Composition Modesβ
Different use cases need different response generation strategies:
FLUID Mode (Default)β
Freeform LLM generation guided by ARQs. The LLM writes the response naturally while adhering to guidelines.
Best for: General customer service, advisory conversations, exploratory interactions.
Customer: "What's the best laptop for video editing?"
Agent: "For video editing, I'd recommend looking at laptops with at least
16GB RAM and a dedicated GPU. Our ProMedia X15 handles 4K footage smoothly
and is currently 15% off. Would you like me to compare a few options in
your price range?"
STRICT Modeβ
Response must be an exact match to a predefined canned response. No generationβjust selection.
Best for: Compliance-critical statements, legal disclaimers, regulated industries.
# Predefined canned responses
"Your account balance is {balance}."
"I am not authorized to provide financial advice."
"Please hold while I transfer you to a specialist."
# Agent selects the appropriate one based on context
COMPOSITED Modeβ
Mix canned response fragments with generated content. Structured parts come from templates; connecting text is generated.
Best for: Consistent messaging with contextual personalization.
[CANNED] "Thank you for contacting support."
[GENERATED] Personalized greeting based on customer name
[CANNED] "How can I help you today?"
CANNED_FLUID Modeβ
Generate new text that sounds similar to canned responses. Maintains brand voice and style without exact matching.
Best for: Brand consistency when exact matching is too rigid.
# Reference canned response style
"We appreciate your patience as we look into this."
# Generated response (similar style)
"Thanks for bearing with me while I check on that order."
Mode Selectionβ
Handling Tool Insightsβ
When tools could not execute, the message composer must handle the situation gracefully:
Missing Dataβ
tool_insights = {
"transfer_money": {
"status": "CANNOT_RUN",
"missing": ["recipient_name"]
}
}
The composer receives the insight: "The customer wanted to transfer money, but the recipient is unknown."
Generated response: "I can help you transfer funds. Who would you like to send the money to?"
Invalid Dataβ
tool_insights = {
"lookup_order": {
"status": "CANNOT_RUN",
"invalid": {
"order_id": "Format does not match expected pattern"
}
}
}
The composer receives the insight: "The order ID provided does not match the expected format."
Generated response: "I couldn't find that order number. Order IDs usually look like 'ORD-12345'. Could you double-check the number?"
Insight Integrationβ
ALGORITHM: Integrate Tool Insights
INPUT: tool_insights, matched_guidelines
OUTPUT: modified_generation_context
FOR each blocked_tool in tool_insights:
IF missing_data:
ADD to context: "Need to ask customer for: {missing_fields}"
IF invalid_data:
ADD to context: "Need to clarify with customer: {invalid_fields}"
OPTIONALLY adjust guideline priority:
- "Ask for order number" becomes more urgent
- "Provide order status" becomes less urgent (blocked)
RETURN modified context for generation
Uncancellable Sectionβ
Message generation runs in an uncancellable section. Once preparation completes, the response will be generated and emittedβeven if the customer sends another message.
The rationale is that once the system has:
- Matched guidelines
- Called tools (potentially with side effects)
- Determined the response content
Abandoning the response mid-generation would leave the conversation in an inconsistent state. The customer might receive no response, or only a partial one.
Generation Flowβ
ALGORITHM: Full Generation Flow
INPUT: response_state (from preparation loop)
OUTPUT: emitted message events
1. ENTER uncancellable section
2. CALL on_generating_messages hook
- Extensions can modify context
- Extensions can abort (returns early)
3. CALL on_guideline_match handlers
- Notify listeners which guidelines matched
4. BUILD generation context:
- Agent persona and style
- Matched guidelines (by criticality)
- Conversation history
- Tool results and insights
- Context variables
- Glossary terms
5. SELECT composition mode:
- Check agent.composition_mode
- Check if guidelines override mode
6. GENERATE based on mode:
FLUID: ARQ-structured LLM generation
STRICT: Canned response selection
COMPOSITED: Mix canned and generated
CANNED_FLUID: Style-matched generation
7. CREATE message events
8. EMIT message events to session
9. CALL on_guideline_message handlers
- Notify listeners what was generated
10. UPDATE agent state:
- Record applied_guideline_ids
- Update journey_paths
11. EXIT uncancellable section
RETURN emitted events
Why This Design?β
Why ARQs Over Chain-of-Thought?β
Chain-of-Thought prompts the LLM to reason before generating, but by the time generation occurs, the reasoning is far back in the context window. Due to recency bias, later tokens exert more influence on the output.
ARQs structure the output so that reasoning occurs immediately before the relevant decision. Each guideline is restated and analyzed directly before the message is written, maximizing the influence of that reasoning.
Research results on Parlant's test suite:
- ARQs: 90.2% success rate
- CoT: 86.1% success rate
- Direct: 81.5% success rate
Why Multiple Modes?β
Different business requirements demand different approaches:
- Regulated industries: Require STRICT mode to ensure compliance
- Brand-conscious companies: Require CANNED_FLUID for consistent voice
- Flexible support teams: Require FLUID for natural conversation
A single approach cannot accommodate all these diverse requirements.
Tradeoffsβ
| Choice | Benefit | Cost |
|---|---|---|
| ARQs | Achieves higher guideline adherence | Consumes slightly more tokens |
| Uncancellable | Maintains consistent conversation state | Requires customer to wait for response |
| Multiple modes | Provides flexibility for diverse use cases | Increases system complexity |
| Tool insight integration | Enables better error handling | Increases context size for LLM |
What's Nextβ
- Debugging: Tracing what happened during generation
- Response Lifecycle: How generation fits in the overall flow
- For canned response configuration, see Canned Responses