Engine Overview

The Engine is the core of Parlant—the part responsible for transforming customer messages into controlled, guideline-compliant responses. This page explains the engine's architecture, design philosophy, and the reasoning behind key decisions.

Design Philosophy

The Problem: The Curse of Instructions

As you add more instructions to an LLM's context, its ability to follow them drops—not linearly, but dramatically. This is known as The Curse of Instructions, and it's a fundamental architectural limitation of LLMs that applies across all models.

In real-world deployments, agents accumulate hundreds of guidelines: different customer scenarios, compliance requirements, regional variations, product-specific rules. Business experts keep finding edge cases that need handling. Providing all of this context to an LLM at once simply doesn't work.

Failed Approaches

Before arriving at the current architecture, we tried several approaches that seemed promising but failed to meet reliability requirements:

Semantic similarity: This approach embeds guidelines and conversations as vectors and computes cosine similarity between them. However, it cannot capture temporal conditions ("after the customer confirmed") or logical operators ("AND payment is credit card"), making it unsuitable for reliable guideline matching.
Cross-encoders: This approach jointly processes guideline-conversation pairs through transformer models. While more sophisticated than simple similarity, cross-encoders still cannot reliably handle nuanced conditions that require reasoning rather than pattern matching.
Custom neural task heads: This approach trains classification layers on top of pretrained embeddings. However, the embeddings proved too lossy—critical information about chronology and state transitions is compressed away during the embedding process.

Core Principle: Prepare Context, Then Respond

Parlant's solution is dynamic filtering: before each response, identify only the few most relevant guidelines for the current conversational state, keeping the LLM in the "safe zone" where instruction-following remains consistent.

This led to a fundamental architectural decision: separate guideline matching from message generation. Rather than relying on a single LLM call to both identify relevant guidelines and generate a response, Parlant uses distinct components with specialized responsibilities:

First, determine which guidelines apply (matching)
Then, generate a response that follows those specific guidelines (composition)

This separation enables each step to be independently optimized, debugged, and improved.

Engine Architecture

Entry Points

The engine has two main entry points:

process(): Handles incoming customer messages with full preparation—guideline matching, tool calling, and message generation
utter(): Generates messages based on specific utterance requests, bypassing guideline matching (used for proactive agent messages)

Core Components

GuidelineMatcher: Filters the full set of guidelines down to those relevant for the current conversational state. Uses category-batched LLM evaluation with specialized prompts for different guideline types. This is where Parlant "lifts the curse"—ensuring the message composer only sees a focused, manageable set of instructions.

ToolCaller: Executes tools associated with matched guidelines. Unlike vendor-provided tool APIs, Parlant's tool calling is guided—tools are only considered if their associated guidelines matched, and multiple iterations are supported within a single response.

MessageComposer: Generates the actual response message using all accumulated context: matched guidelines, tool results, glossary terms, and conversation history. Uses ARQ-based (Attentive Reasoning Queries) structured reasoning for reliable guideline adherence.

GlossaryStore: Retrieves domain-specific terminology relevant to the current conversation. These terms are made available to all other components, grounding the agent's understanding and responses in your business vocabulary.

Supporting Systems

EngineContext: A container that threads execution state through the entire pipeline. Carries the session, agent, customer, interaction history, and accumulated state. Hooks can access and modify this context at various points.

ResponseState: Mutable state that accumulates across preparation iterations. Tracks matched guidelines, tool results, glossary terms, and journey positions. This is what gets passed to message generation when preparation completes.

Key Design Decisions

Why Multiple Preparation Iterations?

Tool results can change which guidelines apply. Consider a banking agent:

Customer asks: "How much money do I have?"
Guideline matches: "When customer asks for account info, call get_balance()"
Tool returns: balance is $15,000
New guideline now matches: "When balance exceeds $10,000, recommend investment options"

A single-pass approach would fail to identify this second guideline. To address this, Parlant iterates until reaching a stable state—one in which no new tool calls are required and no new guidelines have matched. The max_engine_iterations parameter on agents controls the maximum iteration depth.

Why Custom Tool Calling?

Parlant implements its own tool-calling mechanism rather than relying on vendor-provided APIs. There are four reasons for this decision:

Vendor independence: The system supports multiple LLM providers without requiring changes to user configuration.
Guided calling: Only tools associated with matched guidelines are considered, rather than exposing all available tools to the LLM.
Iteration support: Tool results can trigger new guidelines within the same response cycle.
Optimization: Tools whose data is already present in the context can be skipped (via DATA_ALREADY_IN_CONTEXT evaluation).

Why ARQs Instead of Chain-of-Thought?

Chain-of-Thought and other off-the-shelf reasoning techniques do not provide the consistency that Parlant requires. Attentive Reasoning Queries (ARQs) are a prompting technique developed specifically for instruction-following accuracy:

ARQs use structured output to control the order of generation, ensuring that critical information is processed first.
Critical instructions are reinstated immediately before the decision point.
This approach leverages recency bias, where the most recently presented information exerts stronger influence on the output.

Research demonstrated that ARQs achieve a 90.2% success rate, compared to 86.1% for Chain-of-Thought and 81.5% for direct generation across Parlant's test suite.

Why Batched Matching?

Evaluating guidelines individually is prohibitively slow, while evaluating all guidelines together in a single prompt results in reduced accuracy. Parlant addresses this by categorizing guidelines into six distinct types, each with specialized matching strategies:

Category	Purpose
Observational	Detects customer state without triggering an action
Simple Actionable	Handles standard condition-action guidelines
Previously Applied	Determines whether already-applied guidelines should be reapplied
Customer-Dependent	Manages actions that require customer confirmation
Journey Node Selection	Determines position within multi-step workflows
Disambiguation	Resolves conflicts between similar guidelines

Each category is evaluated in parallel batches with optimized prompts, balancing accuracy and latency.

For extending the engine with custom logic, see Engine Extensions.

Design Philosophy​

The Problem: The Curse of Instructions​

Failed Approaches​

Core Principle: Prepare Context, Then Respond​

Engine Architecture​

Entry Points​

Core Components​

Supporting Systems​

Key Design Decisions​

Why Multiple Preparation Iterations?​

Why Custom Tool Calling?​

Why ARQs Instead of Chain-of-Thought?​

Why Batched Matching?​