Message Generation

After guideline matching and tool calling complete, the engine has accumulated context: matched guidelines, tool results, glossary terms, and journey state. Now it's time to generate the actual response. This page explains how Parlant's message composer turns accumulated context into guideline-compliant messages.

The Composition Challenge

The Problem: LLMs Forget Instructions

Even with focused context (achieved through guideline matching), LLMs can still fail to follow instructions mid-generation. The model may begin correctly but gradually drift from the specified guidelines:

Instructions: "Always mention the 30-day return policy when discussing returns."

Customer: "Can I return this shirt?"

LLM output: "Yes, you can return it! Just bring it back to any store location
with your receipt. We're happy to process exchanges too if you'd prefer a
different size or color..."

❌ Never mentioned the 30-day policy

Chain-of-Thought reasoning provides some improvement but proves insufficient. The LLM may reason correctly during the "thinking" phase but still fail to apply that reasoning during generation.

The Goal

Generate responses that:

Follow all matched guidelines reliably
Sound natural and conversational
Prioritize what the customer needs to hear first
Handle blocked tools gracefully

ARQ-Based Enforcement

Parlant uses Attentive Reasoning Queries (ARQs) for reliable guideline adherence. Unlike Chain-of-Thought, ARQs structure the entire output to reinstate critical instructions immediately before decisions.

How ARQs Work

ALGORITHM: ARQ Message Generation

INPUT: matched_guidelines, context, tool_results, tool_insights
OUTPUT: message

1. STRUCTURE the prompt:
   - System: Agent identity, communication style
   - Context: Variables, glossary terms, capabilities
   - History: Conversation so far
   - Tool Results: What tools returned
   - Tool Insights: What tools couldn't run and why
   - Guidelines: Matched guidelines with criticality levels

2. BUILD ARQ completion schema:
   FOR each guideline (ordered by criticality):
     - Field to restate guideline
     - Field to reason about applicability
     - Field to note how to address it

   - Final field: The actual message

3. GENERATE structured output:
   The LLM fills in each field sequentially.
   By the time it writes the message, it has just
   reasoned about every guideline.

4. EXTRACT message from structured output

5. RETURN message

ARQ Schema Structure

{
  "guidelines": [
    {
      "guideline_id": "g1",
      "guideline_content": "Always mention 30-day return policy",
      "how_to_address": "Include policy mention when explaining return process",
      "addressed_in_response": true
    },
    {
      "guideline_id": "g2",
      "guideline_content": "Offer to check order status",
      "how_to_address": "Ask if they want me to look up their order",
      "addressed_in_response": true
    }
  ],
  "message": "Of course! You can return items within 30 days of purchase for a full refund. Would you like me to look up your order to start the return process?"
}

The sequential structure ensures:

Each guideline is explicitly restated (reinstated in context)
The LLM reasons about how to address it (primes the generation)
The message follows immediately after (leverages recency bias)

Criticality Affects Enforcement

Criticality	ARQ Treatment
HIGH	Each guideline gets explicit acknowledgment field; validation may regenerate if not followed
MEDIUM	Grouped reasoning section; included in standard ARQ flow
LOW	Listed but not individually reasoned about

High-criticality guidelines (compliance, safety) get the strongest enforcement.

Composition Modes

Different use cases need different response generation strategies:

FLUID Mode (Default)

Freeform LLM generation guided by ARQs. The LLM writes the response naturally while adhering to guidelines.

Best for: General customer service, advisory conversations, exploratory interactions.

Customer: "What's the best laptop for video editing?"

Agent: "For video editing, I'd recommend looking at laptops with at least
16GB RAM and a dedicated GPU. Our ProMedia X15 handles 4K footage smoothly
and is currently 15% off. Would you like me to compare a few options in
your price range?"

STRICT Mode

Response must be an exact match to a predefined canned response. No generation—just selection.

Best for: Compliance-critical statements, legal disclaimers, regulated industries.

# Predefined canned responses
"Your account balance is {balance}."
"I am not authorized to provide financial advice."
"Please hold while I transfer you to a specialist."

# Agent selects the appropriate one based on context

COMPOSITED Mode

Mix canned response fragments with generated content. Structured parts come from templates; connecting text is generated.

Best for: Consistent messaging with contextual personalization.

[CANNED] "Thank you for contacting support."
[GENERATED] Personalized greeting based on customer name
[CANNED] "How can I help you today?"

CANNED_FLUID Mode

Generate new text that sounds similar to canned responses. Maintains brand voice and style without exact matching.

Best for: Brand consistency when exact matching is too rigid.

# Reference canned response style
"We appreciate your patience as we look into this."

# Generated response (similar style)
"Thanks for bearing with me while I check on that order."

Mode Selection

Handling Tool Insights

When tools could not execute, the message composer must handle the situation gracefully:

Missing Data

tool_insights = {
    "transfer_money": {
        "status": "CANNOT_RUN",
        "missing": ["recipient_name"]
    }
}

The composer receives the insight: "The customer wanted to transfer money, but the recipient is unknown."

Generated response: "I can help you transfer funds. Who would you like to send the money to?"

Invalid Data

tool_insights = {
    "lookup_order": {
        "status": "CANNOT_RUN",
        "invalid": {
            "order_id": "Format does not match expected pattern"
        }
    }
}

The composer receives the insight: "The order ID provided does not match the expected format."

Generated response: "I couldn't find that order number. Order IDs usually look like 'ORD-12345'. Could you double-check the number?"

Insight Integration

ALGORITHM: Integrate Tool Insights

INPUT: tool_insights, matched_guidelines
OUTPUT: modified_generation_context

FOR each blocked_tool in tool_insights:

  IF missing_data:
    ADD to context: "Need to ask customer for: {missing_fields}"

  IF invalid_data:
    ADD to context: "Need to clarify with customer: {invalid_fields}"

  OPTIONALLY adjust guideline priority:
    - "Ask for order number" becomes more urgent
    - "Provide order status" becomes less urgent (blocked)

RETURN modified context for generation

Uncancellable Section

Message generation runs in an uncancellable section. Once preparation completes, the response will be generated and emitted—even if the customer sends another message.

The rationale is that once the system has:

Matched guidelines
Called tools (potentially with side effects)
Determined the response content

Abandoning the response mid-generation would leave the conversation in an inconsistent state. The customer might receive no response, or only a partial one.

Generation Flow

ALGORITHM: Full Generation Flow

INPUT: response_state (from preparation loop)
OUTPUT: emitted message events

1. ENTER uncancellable section

2. CALL on_generating_messages hook
   - Extensions can modify context
   - Extensions can abort (returns early)

3. CALL on_guideline_match handlers
   - Notify listeners which guidelines matched

4. BUILD generation context:
   - Agent persona and style
   - Matched guidelines (by criticality)
   - Conversation history
   - Tool results and insights
   - Context variables
   - Glossary terms

5. SELECT composition mode:
   - Check agent.composition_mode
   - Check if guidelines override mode

6. GENERATE based on mode:
   FLUID: ARQ-structured LLM generation
   STRICT: Canned response selection
   COMPOSITED: Mix canned and generated
   CANNED_FLUID: Style-matched generation

7. CREATE message events

8. EMIT message events to session

9. CALL on_guideline_message handlers
   - Notify listeners what was generated

10. UPDATE agent state:
    - Record applied_guideline_ids
    - Update journey_paths

11. EXIT uncancellable section

RETURN emitted events

Why This Design?

Why ARQs Over Chain-of-Thought?

Chain-of-Thought prompts the LLM to reason before generating, but by the time generation occurs, the reasoning is far back in the context window. Due to recency bias, later tokens exert more influence on the output.

ARQs structure the output so that reasoning occurs immediately before the relevant decision. Each guideline is restated and analyzed directly before the message is written, maximizing the influence of that reasoning.

Research results on Parlant's test suite:

ARQs: 90.2% success rate
CoT: 86.1% success rate
Direct: 81.5% success rate

Why Multiple Modes?

Different business requirements demand different approaches:

Regulated industries: Require STRICT mode to ensure compliance
Brand-conscious companies: Require CANNED_FLUID for consistent voice
Flexible support teams: Require FLUID for natural conversation

A single approach cannot accommodate all these diverse requirements.

Tradeoffs

Choice	Benefit	Cost
ARQs	Achieves higher guideline adherence	Consumes slightly more tokens
Uncancellable	Maintains consistent conversation state	Requires customer to wait for response
Multiple modes	Provides flexibility for diverse use cases	Increases system complexity
Tool insight integration	Enables better error handling	Increases context size for LLM

What's Next

Debugging: Tracing what happened during generation
Response Lifecycle: How generation fits in the overall flow
For canned response configuration, see Canned Responses

The Composition Challenge​

The Problem: LLMs Forget Instructions​

The Goal​

ARQ-Based Enforcement​

How ARQs Work​

ARQ Schema Structure​

Criticality Affects Enforcement​

Composition Modes​

FLUID Mode (Default)​

STRICT Mode​

COMPOSITED Mode​

CANNED_FLUID Mode​

Mode Selection​

Handling Tool Insights​

Missing Data​

Invalid Data​

Insight Integration​

Uncancellable Section​

Generation Flow​

Why This Design?​

Why ARQs Over Chain-of-Thought?​

Why Multiple Modes?​

Tradeoffs​

What's Next​