Python 11 min read

Production AI Agents: Observability, Evaluation & Deployment

Deploy AI agents to production with confidence. Learn monitoring with LangSmith, evaluation strategies, security best practices, and scalable deployment patterns.

MR

Moshiour Rahman

Advertisement

AI Agents Mastery Series

This is Part 6 of our comprehensive AI Agents series—the final chapter.

PartTopicLevel
1Fundamentals - Build from ScratchBeginner
2LangGraph Deep DiveIntermediate
3Local LLMs with OllamaIntermediate
4Tool-Using AgentsIntermediate
5Multi-Agent SystemsAdvanced
6Production DeploymentAdvanced

The Production Challenge

Development agents work on your laptop. Production agents need:

DevelopmentProduction
”It works!“99.9% uptime
print() debuggingStructured logging
Manual testingAutomated evaluation
Trust the LLMVerify everything
Single userThousands concurrent
No securityDefense in depth

This guide covers everything you need to deploy agents responsibly.

Observability with LangSmith

LangSmith is the industry standard for LLM observability. It captures every step of your agent’s execution.

Setup

pip install langsmith
# .env
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=your-langsmith-key
LANGCHAIN_PROJECT=my-agent-production

Automatic Tracing

Once configured, LangChain/LangGraph automatically traces all operations:

# traced_agent.py
import os
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, START, END
from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages

load_dotenv()

# LangSmith automatically traces when LANGCHAIN_TRACING_V2=true

class State(TypedDict):
    messages: Annotated[list, add_messages]

llm = ChatOpenAI(model="gpt-4o-mini")

def agent(state: State) -> State:
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

builder = StateGraph(State)
builder.add_node("agent", agent)
builder.add_edge(START, "agent")
builder.add_edge("agent", END)

graph = builder.compile()

# Every invocation is traced to LangSmith
result = graph.invoke({"messages": [{"role": "user", "content": "Hello"}]})

Custom Trace Metadata

Add context to your traces:

from langsmith import traceable

@traceable(
    name="customer_support_agent",
    tags=["production", "customer-facing"],
    metadata={"version": "1.0.0"}
)
def handle_customer_query(query: str, customer_id: str) -> str:
    """Handle a customer support query with full tracing."""
    result = graph.invoke({
        "messages": [{"role": "user", "content": query}]
    })
    return result["messages"][-1].content

# Usage
response = handle_customer_query(
    query="How do I reset my password?",
    customer_id="cust_123"
)

What LangSmith Captures

DataWhy It Matters
Full message chainDebug conversation flow
Token countsCost tracking
Latency per stepPerformance optimization
Tool calls & resultsDebug tool interactions
Errors & stack tracesQuick issue resolution

Evaluation Strategies

Why Evaluate?

LLMs are non-deterministic. The same input can produce different outputs. Evaluation ensures consistent quality.

Types of Evaluation

TypeWhat It TestsExample
CorrectnessRight answer?Math problems, factual questions
RelevanceOn topic?Response addresses the query
FaithfulnessGrounded in facts?Claims supported by context
HarmlessnessSafe output?No harmful content
Tool UsageCorrect tool selection?Uses right tool for task

Building an Evaluation Pipeline

# evaluation.py
import json
from langchain_openai import ChatOpenAI
from langsmith import Client
from langsmith.evaluation import evaluate

client = Client()
eval_llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Create a dataset of test cases
test_cases = [
    {
        "input": "What is 25 * 4?",
        "expected": "100"
    },
    {
        "input": "What's the capital of France?",
        "expected": "Paris"
    },
    {
        "input": "Summarize: AI agents are autonomous programs.",
        "expected_contains": ["autonomous", "programs"]
    }
]

# Create dataset in LangSmith
dataset_name = "agent-eval-v1"
dataset = client.create_dataset(dataset_name)

for case in test_cases:
    client.create_example(
        inputs={"query": case["input"]},
        outputs={"expected": case.get("expected", case.get("expected_contains"))},
        dataset_id=dataset.id
    )

# Define evaluators
def correctness_evaluator(run, example):
    """Check if the output contains the expected answer."""
    output = run.outputs.get("output", "")
    expected = example.outputs.get("expected")

    if isinstance(expected, list):
        # Check if all expected terms are present
        score = all(term.lower() in output.lower() for term in expected)
    else:
        score = expected.lower() in output.lower()

    return {"score": 1 if score else 0, "key": "correctness"}

def relevance_evaluator(run, example):
    """Use LLM to judge relevance."""
    query = example.inputs.get("query", "")
    output = run.outputs.get("output", "")

    response = eval_llm.invoke([{
        "role": "user",
        "content": f"""Rate the relevance of this response to the query.
        Query: {query}
        Response: {output}

        Score 1 if relevant, 0 if not. Respond with just the number."""
    }])

    try:
        score = int(response.content.strip())
    except:
        score = 0

    return {"score": score, "key": "relevance"}

# Run evaluation
def run_evaluation(agent_func):
    """Evaluate an agent against the test dataset."""
    results = evaluate(
        agent_func,
        data=dataset_name,
        evaluators=[correctness_evaluator, relevance_evaluator],
        experiment_prefix="agent-v1"
    )

    return results

Continuous Evaluation

Run evaluations on every deployment:

# .github/workflows/evaluate.yml
name: Agent Evaluation

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  evaluate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Setup Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install dependencies
        run: pip install -r requirements.txt

      - name: Run evaluations
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          LANGCHAIN_API_KEY: ${{ secrets.LANGCHAIN_API_KEY }}
        run: python -m pytest tests/eval/ -v

      - name: Check threshold
        run: |
          # Fail if correctness < 90%
          python scripts/check_eval_threshold.py --metric correctness --threshold 0.9

Security Best Practices

Input Validation

Never trust user input:

# security.py
import re
from typing import Optional

class InputValidator:
    """Validate and sanitize user inputs."""

    MAX_LENGTH = 10000
    BLOCKED_PATTERNS = [
        r'ignore.*previous.*instructions',
        r'system.*prompt',
        r'<script>',
        r'javascript:',
    ]

    @classmethod
    def validate(cls, text: str) -> tuple[bool, Optional[str]]:
        """Validate input text.

        Returns:
            (is_valid, error_message)
        """
        if not text or not text.strip():
            return False, "Empty input"

        if len(text) > cls.MAX_LENGTH:
            return False, f"Input exceeds {cls.MAX_LENGTH} characters"

        text_lower = text.lower()
        for pattern in cls.BLOCKED_PATTERNS:
            if re.search(pattern, text_lower):
                return False, "Input contains blocked content"

        return True, None

    @classmethod
    def sanitize(cls, text: str) -> str:
        """Sanitize input text."""
        # Remove null bytes
        text = text.replace('\x00', '')

        # Limit length
        text = text[:cls.MAX_LENGTH]

        # Remove control characters (except newlines, tabs)
        text = ''.join(char for char in text if char.isprintable() or char in '\n\t')

        return text.strip()

# Usage in agent
def secure_agent(user_input: str) -> str:
    is_valid, error = InputValidator.validate(user_input)

    if not is_valid:
        return f"Invalid input: {error}"

    sanitized = InputValidator.sanitize(user_input)
    return process_with_agent(sanitized)

Tool Execution Safety

# secure_tools.py
from langchain_core.tools import tool
import subprocess
import os

SAFE_DIRECTORIES = ["/tmp/agent_workspace", "/app/data"]
BLOCKED_COMMANDS = ["rm", "sudo", "chmod", "chown", "curl", "wget"]

@tool
def secure_file_read(filepath: str) -> str:
    """Securely read a file with path validation."""
    # Resolve to absolute path
    abs_path = os.path.abspath(filepath)

    # Check against allowed directories
    if not any(abs_path.startswith(safe_dir) for safe_dir in SAFE_DIRECTORIES):
        return f"Access denied: {filepath} is outside allowed directories"

    # Prevent path traversal
    if ".." in filepath:
        return "Access denied: Path traversal detected"

    try:
        with open(abs_path, 'r') as f:
            content = f.read(100000)  # Limit read size
        return content
    except Exception as e:
        return f"Error: {str(e)}"

@tool
def secure_shell(command: str) -> str:
    """Execute shell command with restrictions."""
    cmd_parts = command.split()

    if not cmd_parts:
        return "No command provided"

    # Check for blocked commands
    if cmd_parts[0] in BLOCKED_COMMANDS:
        return f"Command '{cmd_parts[0]}' is not allowed"

    # Check for shell injection patterns
    dangerous = ['|', ';', '&', '`', '$', '>', '<']
    if any(char in command for char in dangerous):
        return "Command contains disallowed characters"

    try:
        result = subprocess.run(
            cmd_parts,
            capture_output=True,
            text=True,
            timeout=30,
            cwd="/tmp/agent_workspace"
        )
        return result.stdout or result.stderr or "Command completed"
    except subprocess.TimeoutExpired:
        return "Command timed out"
    except Exception as e:
        return f"Error: {str(e)}"

Rate Limiting

# rate_limiting.py
import time
from collections import defaultdict
from functools import wraps

class RateLimiter:
    """Simple in-memory rate limiter."""

    def __init__(self, requests_per_minute: int = 60):
        self.requests_per_minute = requests_per_minute
        self.requests = defaultdict(list)

    def is_allowed(self, user_id: str) -> bool:
        """Check if request is allowed."""
        now = time.time()
        minute_ago = now - 60

        # Clean old requests
        self.requests[user_id] = [
            ts for ts in self.requests[user_id]
            if ts > minute_ago
        ]

        # Check limit
        if len(self.requests[user_id]) >= self.requests_per_minute:
            return False

        self.requests[user_id].append(now)
        return True

rate_limiter = RateLimiter(requests_per_minute=30)

def rate_limited(func):
    """Decorator to add rate limiting."""
    @wraps(func)
    def wrapper(user_id: str, *args, **kwargs):
        if not rate_limiter.is_allowed(user_id):
            return {"error": "Rate limit exceeded. Please try again later."}
        return func(user_id, *args, **kwargs)
    return wrapper

@rate_limited
def agent_endpoint(user_id: str, query: str) -> dict:
    """Rate-limited agent endpoint."""
    result = graph.invoke({"messages": [{"role": "user", "content": query}]})
    return {"response": result["messages"][-1].content}

Deployment Patterns

Docker Deployment

# Dockerfile
FROM python:3.11-slim

WORKDIR /app

# Install dependencies
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

# Copy application
COPY . .

# Create non-root user
RUN useradd -m appuser && chown -R appuser:appuser /app
USER appuser

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:8000/health || exit 1

EXPOSE 8000

CMD ["uvicorn", "main:app", "--host", "0.0.0.0", "--port", "8000"]

FastAPI Service

# main.py
from fastapi import FastAPI, HTTPException, Depends
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import os

app = FastAPI(title="AI Agent API", version="1.0.0")

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://yourdomain.com"],
    allow_methods=["POST"],
    allow_headers=["Authorization", "Content-Type"],
)

class AgentRequest(BaseModel):
    query: str
    user_id: str

class AgentResponse(BaseModel):
    response: str
    trace_id: str

@app.get("/health")
def health_check():
    return {"status": "healthy"}

@app.post("/agent", response_model=AgentResponse)
async def run_agent(request: AgentRequest):
    # Validate input
    is_valid, error = InputValidator.validate(request.query)
    if not is_valid:
        raise HTTPException(status_code=400, detail=error)

    # Rate limit
    if not rate_limiter.is_allowed(request.user_id):
        raise HTTPException(status_code=429, detail="Rate limit exceeded")

    try:
        result = graph.invoke({
            "messages": [{"role": "user", "content": request.query}]
        })

        return AgentResponse(
            response=result["messages"][-1].content,
            trace_id="trace_xxx"  # From LangSmith
        )

    except Exception as e:
        raise HTTPException(status_code=500, detail="Agent error")

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Kubernetes Deployment

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ai-agent
spec:
  replicas: 3
  selector:
    matchLabels:
      app: ai-agent
  template:
    metadata:
      labels:
        app: ai-agent
    spec:
      containers:
      - name: agent
        image: your-registry/ai-agent:latest
        ports:
        - containerPort: 8000
        env:
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: agent-secrets
              key: openai-key
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"
        livenessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 10
          periodSeconds: 30
        readinessProbe:
          httpGet:
            path: /health
            port: 8000
          initialDelaySeconds: 5
          periodSeconds: 10
---
apiVersion: v1
kind: Service
metadata:
  name: ai-agent
spec:
  selector:
    app: ai-agent
  ports:
  - port: 80
    targetPort: 8000
  type: LoadBalancer

Cost Management

Token Tracking

# cost_tracking.py
from langchain_openai import ChatOpenAI
from langchain.callbacks import get_openai_callback

def tracked_agent_call(query: str) -> dict:
    """Run agent with cost tracking."""
    with get_openai_callback() as cb:
        result = graph.invoke({
            "messages": [{"role": "user", "content": query}]
        })

        return {
            "response": result["messages"][-1].content,
            "cost": {
                "total_tokens": cb.total_tokens,
                "prompt_tokens": cb.prompt_tokens,
                "completion_tokens": cb.completion_tokens,
                "total_cost": cb.total_cost
            }
        }

Cost Optimization Strategies

StrategySavingsTrade-off
Use smaller models50-90%Less capability
Cache responses20-50%Stale data
Limit context30-60%Less history
Batch requests10-20%Added latency
Rate limitingVariableUser friction

Monitoring Dashboard

Track these metrics:

MetricTargetAlert Threshold
Response latency (p95)< 5s> 10s
Success rate> 99%< 95%
Cost per request< $0.05> $0.10
Tool call success> 95%< 90%
User satisfaction> 4.0/5< 3.5/5

Production Checklist

Before deploying:

  • Input validation on all user inputs
  • Rate limiting configured
  • Tool execution sandboxed
  • Secrets in environment variables (not code)
  • Logging configured (LangSmith or equivalent)
  • Error handling for all tool failures
  • Cost limits set
  • Evaluation suite passing
  • Health check endpoint working
  • Monitoring alerts configured
  • Rollback plan documented

Summary

AreaKey Takeaway
ObservabilityUse LangSmith for tracing
EvaluationTest before every deploy
SecurityNever trust user input
DeploymentDocker + Kubernetes
CostTrack and limit spending

Series Complete!

Congratulations! You’ve completed the AI Agents Mastery Series. You now know how to:

  1. Build agents from scratch
  2. Use LangGraph for complex workflows
  3. Run agents locally with Ollama
  4. Integrate real-world tools
  5. Build multi-agent systems
  6. Deploy to production

What’s Next?

Full Code Repository

git clone https://github.com/Moshiour027/ai-agents-mastery.git
cd ai-agents-mastery/06-production
pip install -r requirements.txt
python main.py

This concludes the AI Agents Mastery Series. Go build something amazing!

Advertisement

MR

Moshiour Rahman

Software Architect & AI Engineer

Share:
MR

Moshiour Rahman

Software Architect & AI Engineer

Enterprise software architect with deep expertise in financial systems, distributed architecture, and AI-powered applications. Building large-scale systems at Fortune 500 companies. Specializing in LLM orchestration, multi-agent systems, and cloud-native solutions. I share battle-tested patterns from real enterprise projects.

Related Articles

Comments

Comments are powered by GitHub Discussions.

Configure Giscus at giscus.app to enable comments.