Rethinking How We Build Customer-Facing AI Agents
Building customer-facing AI agents can seem deceptively simple at first.
Modern LLMs make it easy to create impressive demos that handle basic conversations well. But as many development teams discover, the journey from demo to production reveals deeper challenges that traditional approaches struggle to address.
Let's examine three common methodologies for building AI agents, understanding not just their mechanics, but why they fall short in real-world applications. This understanding will help us recognize what's truly needed for successful production deployments.
The Promise and Pitfalls of Fine-Tuning
Fine-tuning—at best—is somewhat like teaching someone your company's way of doing things by having them study thousands of past conversations. The idea seems logical: if an AI model learns from your actual customer interactions, shouldn't it naturally adopt your specific approach?
To understand why this fails in practice, let's consider how a customer service department actually works. When a company launches a new product or updates a policy, they don't retrain their entire staff from scratch—they simply communicate the changes. But with fine-tuned models, even small changes to your service approach require retraining the entire model, a process that's both expensive and time-consuming.
Consider a real-world scenario: A financial services company fine-tunes their model on thousands of support conversations. Three months later, they need to adjust how their agent handles sensitive customer information. With a human team, this would be a simple policy update. With a fine-tuned model, they're looking at collecting new training data, running expensive training jobs, and testing everything again. This makes rapid iteration—the kind needed for real customer service improvement—practically impossible.
Model Degradation
There's also a more subtle but technically significant problem with fine-tuning that many developers discover too late. When you fine-tune an LLM for specific conversational behaviors, you often inadvertently degrade its general capabilities—particularly its ability to generate structured outputs like API calls or database queries.
This degradation becomes particularly problematic when you need your agent to interact with other systems, especially for RAG (Retrieval-Augmented Generation) implementations. The original LLM might have been excellent at formatting database queries or generating precise API parameters, but the fine-tuned version often struggles with these structured tasks.
The Knowledge-Behavior Gap in RAG Systems
Retrieval-Augmented Generation (RAG) solves the knowledge update problem in a seemingly elegant way. Instead of baking information into the model itself, RAG systems simply reference relevant documents during conversations, much like how a human agent might consult a knowledge base. This approach also works better when you need access-control for your data.
But in real-life scenarios, RAG has also been revealing an interesting truth about customer service: having access to information isn't the same as knowing how to use it effectively. Think about training a new customer service representative. You wouldn't just hand them a manual and consider them ready for customer interactions. They need to understand when and how to apply that information, how to handle different customers, when to escalate issues, and how to maintain the company's tone and values. This tends to change from use case to use case, company to company—and, from time to time.
This is where classic RAG systems fall short. They ensure your agent can access your refund policy, but they don't help it understand whether now is the right time to mention refunds, or how to communicate that policy to an already frustrated customer. These behavioral aspects—the "why" and "how" rather than the "what"—remain unaddressed.
The Traps of Graph-Based Conversations
Graph-based frameworks like LangGraph attempt to solve the behavior problem through structured conversation flows. Imagine creating a detailed flowchart for every possible customer interaction. In theory, this ensures your agent always knows what to do next.
But anyone who's worked in customer service knows that real conversations rarely follow neat, predictable paths. A customer might ask about pricing in the middle of a technical support discussion, or bring up a completely unrelated issue while you're explaining a feature. Graph-based systems force developers to either create increasingly complex graphs to handle every possible scenario, or accept that their agent will feel rigid and unnatural in real conversations.
This complexity doesn't just make development harder—it makes maintenance and improvement nearly impossible. When customer feedback suggests your agent needs to handle a situation differently, changing the behavior means navigating and modifying an intricate web of states and transitions, often with unforeseen consequences.
Understanding What's Really Needed
Again, these approaches all reveal something important about building effective AI agents: the challenge isn't just technical—it's about understanding how customer service actually works in practice. Real customer service excellence comes from clear guidelines, consistent behavior, and the ability to adapt quickly based on feedback.
This insight points to what's really needed: a way to directly shape agent behavior without the indirection of training data, the limitations of pure information retrieval, or the complexity of predetermined conversation paths. We need systems that let us update behavior as easily as we update documentation, that maintain consistency while allowing natural conversation flow, and that separate the "what" of information from the "why" and "how" of interaction.
A Different Approach with Parlant
When we built Parlant, we started with a fundamental observation: LLMs are like highly knowledgeable strangers who know countless ways to handle any situation. This vast knowledge is both their strength and their challenge—without clear guidance, they make reasonable but arbitrary choices that may not align with what we actually need.
Understanding the Core Challenge
Think about how organizations actually improve their customer service. They don't rewrite their training manuals from scratch every time they want to adjust how representatives handle a situation. Instead, they provide clear guidelines about specific scenarios and how to handle them. These guidelines evolve based on customer feedback, business needs, and learned best practices.
This is where Parlant's approach becomes particularly relevant. Instead of trying to bake behavior into a model through training data or control it through complex conversation graphs, Parlant provides direct and reliable mechanisms for specifying and updating how your agent should behave in different situations.
The Power of Immediate Feedback
Consider a real example from our market research into pre-LLM agents: a team in a large enterprise noticed that a simple change in how their agent greeted customers led to a 10% increase in engagement. With traditional approaches, implementing and testing such a change would have been a significant undertaking. With Parlant, it's as simple as updating a guideline.
This immediacy transforms how teams can develop and refine their AI agents. Product managers can suggest changes, customer service experts can provide insights, and developers can implement these improvements instantly—all while maintaining the consistency and reliability needed for production systems.
Beyond Simple Instructions
Parlant isn't just about giving instructions to an AI. It's built around understanding how real software teams work. Every behavioral modification is stored as JSON files, which means changes can be version-controlled through Git. Teams can branch and merge behavioral changes just like they do with code, review modifications before they go live, and roll back changes if needed.
More importantly, Parlant ensures coherence across all these behavioral specifications. When you add new guidelines, Parlant automatically checks for conflicts with existing ones. This prevents the kind of contradictory behaviors that often emerge in complex prompt-engineering setups.
A Foundation for the Future
While LLM technology continues to advance—with costs dropping and response times improving—the fundamental challenge of aligning AI behavior with human intentions remains. Parlant provides the infrastructure needed for building AI agents that can truly serve as reliable representatives of your organization.
The result is a development process that finally matches how organizations actually work: iterative, collaborative, and responsive to real-world feedback. It's about building AI agents that don't just work in demos, but excel in production, consistently delivering the kind of customer experience your organization aims to provide.