Skip to main content

Message Generation

After guideline matching and tool calling complete, the engine has accumulated context: matched guidelines, tool results, glossary terms, and journey state. Now it's time to generate the actual response. This page explains how Parlant's message composer turns accumulated context into guideline-compliant messages.

The Composition Challenge​

The Problem: LLMs Forget Instructions​

Even with focused context (achieved through guideline matching), LLMs can still fail to follow instructions mid-generation. The model may begin correctly but gradually drift from the specified guidelines:

Instructions: "Always mention the 30-day return policy when discussing returns."

Customer: "Can I return this shirt?"

LLM output: "Yes, you can return it! Just bring it back to any store location
with your receipt. We're happy to process exchanges too if you'd prefer a
different size or color..."

❌ Never mentioned the 30-day policy

Chain-of-Thought reasoning provides some improvement but proves insufficient. The LLM may reason correctly during the "thinking" phase but still fail to apply that reasoning during generation.

The Goal​

Generate responses that:

  1. Follow all matched guidelines reliably
  2. Sound natural and conversational
  3. Prioritize what the customer needs to hear first
  4. Handle blocked tools gracefully

ARQ-Based Enforcement​

Parlant uses Attentive Reasoning Queries (ARQs) for reliable guideline adherence. Unlike Chain-of-Thought, ARQs structure the entire output to reinstate critical instructions immediately before decisions.

How ARQs Work​

ALGORITHM: ARQ Message Generation

INPUT: matched_guidelines, context, tool_results, tool_insights
OUTPUT: message

1. STRUCTURE the prompt:
- System: Agent identity, communication style
- Context: Variables, glossary terms, capabilities
- History: Conversation so far
- Tool Results: What tools returned
- Tool Insights: What tools couldn't run and why
- Guidelines: Matched guidelines with criticality levels

2. BUILD ARQ completion schema:
FOR each guideline (ordered by criticality):
- Field to restate guideline
- Field to reason about applicability
- Field to note how to address it

- Final field: The actual message

3. GENERATE structured output:
The LLM fills in each field sequentially.
By the time it writes the message, it has just
reasoned about every guideline.

4. EXTRACT message from structured output

5. RETURN message

ARQ Schema Structure​

{
"guidelines": [
{
"guideline_id": "g1",
"guideline_content": "Always mention 30-day return policy",
"how_to_address": "Include policy mention when explaining return process",
"addressed_in_response": true
},
{
"guideline_id": "g2",
"guideline_content": "Offer to check order status",
"how_to_address": "Ask if they want me to look up their order",
"addressed_in_response": true
}
],
"message": "Of course! You can return items within 30 days of purchase for a full refund. Would you like me to look up your order to start the return process?"
}

The sequential structure ensures:

  1. Each guideline is explicitly restated (reinstated in context)
  2. The LLM reasons about how to address it (primes the generation)
  3. The message follows immediately after (leverages recency bias)

Criticality Affects Enforcement​

CriticalityARQ Treatment
HIGHEach guideline gets explicit acknowledgment field; validation may regenerate if not followed
MEDIUMGrouped reasoning section; included in standard ARQ flow
LOWListed but not individually reasoned about

High-criticality guidelines (compliance, safety) get the strongest enforcement.

Composition Modes​

Different use cases need different response generation strategies:

FLUID Mode (Default)​

Freeform LLM generation guided by ARQs. The LLM writes the response naturally while adhering to guidelines.

Best for: General customer service, advisory conversations, exploratory interactions.

Customer: "What's the best laptop for video editing?"

Agent: "For video editing, I'd recommend looking at laptops with at least
16GB RAM and a dedicated GPU. Our ProMedia X15 handles 4K footage smoothly
and is currently 15% off. Would you like me to compare a few options in
your price range?"

STRICT Mode​

Response must be an exact match to a predefined canned response. No generationβ€”just selection.

Best for: Compliance-critical statements, legal disclaimers, regulated industries.

# Predefined canned responses
"Your account balance is {balance}."
"I am not authorized to provide financial advice."
"Please hold while I transfer you to a specialist."

# Agent selects the appropriate one based on context

COMPOSITED Mode​

Mix canned response fragments with generated content. Structured parts come from templates; connecting text is generated.

Best for: Consistent messaging with contextual personalization.

[CANNED] "Thank you for contacting support."
[GENERATED] Personalized greeting based on customer name
[CANNED] "How can I help you today?"

CANNED_FLUID Mode​

Generate new text that sounds similar to canned responses. Maintains brand voice and style without exact matching.

Best for: Brand consistency when exact matching is too rigid.

# Reference canned response style
"We appreciate your patience as we look into this."

# Generated response (similar style)
"Thanks for bearing with me while I check on that order."

Mode Selection​

Handling Tool Insights​

When tools could not execute, the message composer must handle the situation gracefully:

Missing Data​

tool_insights = {
"transfer_money": {
"status": "CANNOT_RUN",
"missing": ["recipient_name"]
}
}

The composer receives the insight: "The customer wanted to transfer money, but the recipient is unknown."

Generated response: "I can help you transfer funds. Who would you like to send the money to?"

Invalid Data​

tool_insights = {
"lookup_order": {
"status": "CANNOT_RUN",
"invalid": {
"order_id": "Format does not match expected pattern"
}
}
}

The composer receives the insight: "The order ID provided does not match the expected format."

Generated response: "I couldn't find that order number. Order IDs usually look like 'ORD-12345'. Could you double-check the number?"

Insight Integration​

ALGORITHM: Integrate Tool Insights

INPUT: tool_insights, matched_guidelines
OUTPUT: modified_generation_context

FOR each blocked_tool in tool_insights:

IF missing_data:
ADD to context: "Need to ask customer for: {missing_fields}"

IF invalid_data:
ADD to context: "Need to clarify with customer: {invalid_fields}"

OPTIONALLY adjust guideline priority:
- "Ask for order number" becomes more urgent
- "Provide order status" becomes less urgent (blocked)

RETURN modified context for generation

Uncancellable Section​

Message generation runs in an uncancellable section. Once preparation completes, the response will be generated and emittedβ€”even if the customer sends another message.

The rationale is that once the system has:

  • Matched guidelines
  • Called tools (potentially with side effects)
  • Determined the response content

Abandoning the response mid-generation would leave the conversation in an inconsistent state. The customer might receive no response, or only a partial one.

Generation Flow​

ALGORITHM: Full Generation Flow

INPUT: response_state (from preparation loop)
OUTPUT: emitted message events

1. ENTER uncancellable section

2. CALL on_generating_messages hook
- Extensions can modify context
- Extensions can abort (returns early)

3. CALL on_guideline_match handlers
- Notify listeners which guidelines matched

4. BUILD generation context:
- Agent persona and style
- Matched guidelines (by criticality)
- Conversation history
- Tool results and insights
- Context variables
- Glossary terms

5. SELECT composition mode:
- Check agent.composition_mode
- Check if guidelines override mode

6. GENERATE based on mode:
FLUID: ARQ-structured LLM generation
STRICT: Canned response selection
COMPOSITED: Mix canned and generated
CANNED_FLUID: Style-matched generation

7. CREATE message events

8. EMIT message events to session

9. CALL on_guideline_message handlers
- Notify listeners what was generated

10. UPDATE agent state:
- Record applied_guideline_ids
- Update journey_paths

11. EXIT uncancellable section

RETURN emitted events

Why This Design?​

Why ARQs Over Chain-of-Thought?​

Chain-of-Thought prompts the LLM to reason before generating, but by the time generation occurs, the reasoning is far back in the context window. Due to recency bias, later tokens exert more influence on the output.

ARQs structure the output so that reasoning occurs immediately before the relevant decision. Each guideline is restated and analyzed directly before the message is written, maximizing the influence of that reasoning.

Research results on Parlant's test suite:

  • ARQs: 90.2% success rate
  • CoT: 86.1% success rate
  • Direct: 81.5% success rate

Why Multiple Modes?​

Different business requirements demand different approaches:

  • Regulated industries: Require STRICT mode to ensure compliance
  • Brand-conscious companies: Require CANNED_FLUID for consistent voice
  • Flexible support teams: Require FLUID for natural conversation

A single approach cannot accommodate all these diverse requirements.

Tradeoffs​

ChoiceBenefitCost
ARQsAchieves higher guideline adherenceConsumes slightly more tokens
UncancellableMaintains consistent conversation stateRequires customer to wait for response
Multiple modesProvides flexibility for diverse use casesIncreases system complexity
Tool insight integrationEnables better error handlingIncreases context size for LLM

What's Next​