A Practitioner's Guide to Production-Grade Agentic Workflows
Enterprise AI has crossed a threshold. RAG systems are valuable — but fundamentally reactive. A user asks, the system retrieves and answers. The control flow is fixed. The system cannot plan, cannot take action, and cannot adapt based on what it finds.
Agentic AI changes this. An agent is an LLM that operates in a loop: it reasons about a goal, chooses and executes actions using tools, observes the results, and decides what to do next — until the task is complete. The control flow is determined at runtime by the model, not hardcoded by the developer. This is not an incremental improvement over RAG. It is a fundamentally different architectural pattern.
The reference example throughout this playbook is the Padala Bank Credit Analyst Agent — a multi-tool, multi-step agent that automates the assembly of commercial credit approval packages. It captures the full complexity of enterprise agentic deployment: sensitive data, compliance requirements, multiple internal systems, and consequential outputs requiring human oversight.
RAG SYSTEM
Fixed control flow — developer defines path at build time
AGENTIC SYSTEM
Dynamic control flow — model decides execution path at runtime
Agents are not always the right choice. The additional complexity is only justified when the task genuinely requires it.
| Dimension | Use RAG | Use an Agent |
|---|---|---|
| Task type | Question answering, summarization | Multi-step workflows, planning, decision-making |
| Data sources | Single vector index | Multiple heterogeneous systems (DB, API, files) |
| Actions required | None — read-only | Write operations, calculations, API calls |
| Control flow | Always the same path | Varies based on what the agent finds |
| Latency tolerance | Sub-second responses | Minutes acceptable for complex tasks |
| Failure modes | Wrong answer, hallucination | Tool failure, blast radius, audit trail |
| Example | Internal knowledge assistant | Credit package assembly, report generation |
The Corporate Credit team at Padala Bank evaluates commercial lending opportunities. For each deal, an analyst must assemble a Credit Approval Package (CAP) — covering the borrower's financials, existing exposure, comparable past deals, risk ratios, and regulatory capital impact. Today, this takes 2–3 days of manual work across Bloomberg terminals, internal SQL databases, a shared drive of past memos, and a Basel III capital calculator spreadsheet.
A RAG chatbot cannot solve this. It cannot pull live Bloomberg data, query the loan book, run calculations, or synthesize a structured memo from 8 different data sources. This is precisely the right problem for an agent.
FUNCTIONAL REQUIREMENTS
NON-FUNCTIONAL REQUIREMENTS
A production enterprise agent is not a single service. It is five interconnected layers, each with distinct responsibilities.
Where the analyst interacts with the agent. Can be a web application, Microsoft Teams bot, or direct API.
Must support bidirectional interaction — the analyst submits a goal, watches agent reasoning in real-time (streaming), and interrogates intermediate results before approving the final output. Unlike a chatbot, this surfaces the agent's step-by-step reasoning as it progresses. Essential for building analyst trust in compliance-sensitive environments.
The brain of the system. Hosts the state graph, manages task state, and coordinates tool calls.
Responsible for the agent loop: Reason → Act → Observe → repeat. The planner node is a Claude call that receives the current state and decides what to do next — genuine LLM reasoning over the task graph, not heuristic logic. Models read current state (what has been retrieved) and the goal, then output a structured action plan.
Every external capability the agent can use is exposed through an MCP server running inside Padala Bank's VPC.
The agent never calls Bloomberg or the SQL server directly — it calls the MCP server, which handles authentication, data formatting, and error handling. One MCP server per system boundary: Bloomberg MCP, SQL Loan Book MCP, RAG Knowledge MCP, Calculator MCP.
Significantly more complex than in RAG. Four distinct memory types must be managed.
Working Memory (LangGraph state — current task data), Semantic Memory (existing RAG vector index for past memos and policies), Episodic Memory (past agent runs — similar deal patterns), Procedural Memory (system prompt — CAP template, covenant thresholds, Basel rules).
Not optional in enterprise agentic systems — where human judgment is inserted before any consequential action.
LangGraph implements this via graph interrupts: execution pauses at a designated node, serializes current state to persistent storage, and waits for a human signal to resume. The analyst reviews the full CAP memo, can ask clarifying questions, request revisions, and only then approves submission to the deal management system.
MCP Host
The agent process — the client that initiates connections and sends tool call requests. In our case, the LangGraph orchestrator.
MCP Server
A lightweight service that exposes tools. Each server has exactly two responsibilities: declare what tools it exposes, and handle calls to those tools.
Tool
A named, typed function the agent can call. Tools have descriptions (written in plain English for the model to read), typed input schemas, and return typed responses.
One MCP server per system boundary is the correct pattern — reflecting real security and operational requirements:
| MCP Server | System | Tools Exposed | Key Security Control |
|---|---|---|---|
| bloomberg-mcp | Bloomberg Terminal API | get_financials, get_ratings, get_news | API key isolation, VPC-only egress |
| sql-loanbook-mcp | Internal SQL Database | query_exposure, get_deal_history | Read-only, non-SELECT queries blocked |
| rag-knowledge-mcp | Vector Index (past memos + policies) | search_past_memos, get_credit_policy | Permission_group filtering at query time |
| calculator-mcp | Pure Python Functions | calc_leverage, calc_coverage, calc_basel3 | No external calls — deterministic outputs |
WHY LANGGRAPH FOR ENTERPRISE
THE PADALA BANK STATE GRAPH
Memory in agentic systems is far richer than in RAG. A RAG system has one memory type: the vector index. A production agent must manage four distinct memory types.
Working Memory
LangGraph State (in-memory + PostgreSQL checkpoint)
The CAP State object shared across all nodes. Every node reads from and writes to it. Accumulates throughout the agent run. Checkpointed after every node — agent can be stopped and resumed without losing any retrieved data.
Semantic Memory
Vector Index (your existing RAG system)
Past credit memos, credit policies, covenant templates. Exposed via the RAG MCP server. Your existing vector index becomes the semantic memory of your agents — the RAG investment does not go away, it becomes more valuable.
Episodic Memory
Agent run history (PostgreSQL / LangSmith)
Records of past agent runs. Which tool sequences produced high-quality memos? What failure patterns have occurred? Enables the system to learn from experience over time.
Procedural Memory
System prompt + prompt versioning
How to do the task: the CAP template structure, covenant threshold definitions, Basel III calculation rules, citation format requirements. Injected at the start of every run.
Review & Approve
When: Agent completes full task draft, then interrupts for human decision before any write operation
Padala Bank use: Padala Bank primary pattern — analyst reviews full CAP before deal system submission
Exception Escalation
When: Agent escalates to human when it encounters anomalies or hard-stop conditions mid-run
Padala Bank use: Padala Bank: if leverage > 7.0x OR Bloomberg data missing → escalate before drafting
Streaming Review
When: Human watches agent reasoning in real time and can intervene at any step
Padala Bank use: For complex or novel deals where analyst wants to guide the research direction
Agent evaluation is fundamentally harder than RAG evaluation because agents are non-deterministic and multi-step. The same goal can produce different tool call sequences and different final outputs on different runs. You are not evaluating a single answer — you are evaluating an execution trace.
Step-Level Evaluation
Evaluate each individual tool call and node output in isolation.
Trace-Level Evaluation
Evaluate the full execution trace for a single agent run.
Outcome-Level Evaluation
Evaluate the final output against ground truth and business requirements.
THE PROBLEM
The same goal submitted twice may produce different tool call sequences and different final outputs. This is not a bug — it is a property of LLM reasoning. But it creates fundamental challenges for compliance teams expecting consistent, reproducible outputs.
THE SOLUTION
Make the audit trail richer than the output. Every tool call is logged with its exact arguments and return values. The analyst approval is recorded with timestamp and notes. The final memo is stored with its full provenance chain. Even if two runs produce slightly different memos, compliance can trace exactly how each was produced.
THE PROBLEM
When a tool call fails — Bloomberg API is down, SQL query times out, the RAG index returns empty results — the agent must handle it gracefully. Without explicit error handling, the agent will either hallucinate data or produce an incomplete output without surfacing the failure.
THE SOLUTION
Wrap every tool call in a retry-with-fallback handler. If a tool fails after retries, return a structured error state rather than raising an exception. Check status before using result — if the tool failed, set leverage_flagged conservatively and escalate to human review rather than proceeding with incomplete data.
THE PROBLEM
Malicious content in retrieved data (e.g. a Bloomberg news article or past credit memo) could attempt to override the agent's instructions. An MCP server returning crafted responses could attempt to redirect tool calls. These are real attack surfaces in production.
THE SOLUTION
Treat all tool outputs as untrusted data — never inject them directly into the system prompt context without sanitization. Safety checks live inside the MCP server, not in the agent. The server is the last line of defence for its system boundary.
THE PROBLEM
Multi-step agents make multiple LLM calls per task. A single Padala Bank deal evaluation makes approximately 8–12 LLM calls. At claude-sonnet rates, that is $0.15–0.40 per deal. At 500 deals/day, caching, parallelization, and model tiering become critical architectural decisions.
THE SOLUTION
Run independent data source calls in parallel (Bloomberg + SQL + RAG can be fetched simultaneously). Use model tiering: smaller/faster models for routing and planning nodes; full models for draft and synthesis. Implement semantic caching for repeated queries against static data.
Build and validate the MCP server layer before writing a single line of agent code.
Ship the credit analyst agent to a pilot group of 10 analysts. Full HITL. Every run reviewed.
Expand from one agent to an agent catalog. Multiple agents share the same MCP server layer.
Operate agents at scale with production SLAs, cost controls, and continuous evaluation.
| Role | Responsibilities |
|---|---|
| AI Platform Engineer (2) | LangGraph orchestration, agent architecture, framework ownership |
| MCP Server Engineers (2) | Tool layer: one per high-complexity integration (Bloomberg) |
| ML/Eval Engineer (1) | Evaluation pipeline, golden dataset, tracing infrastructure |
| Product Manager (1) | Analyst workflow, pilot coordination, success metrics |
| Security / Compliance (0.5) | Audit trail design, RBAC, data residency, red team coordination |
Data sensitivity & regulatory requirements
Workflow customization
Engineering team capability
Vendor lock-in tolerance
The shift from RAG to agents is not primarily a model upgrade — it is an architectural and organizational shift. The locus of control moves from your code to the model. The failure modes change from wrong answers to wrong actions. The stakes are higher, the rewards are greater, and the engineering discipline required is different.
Build the tool layer before the agent layer. Design HITL into the architecture from day one, not as an afterthought. Make the audit trail richer than the output. And measure at the trace level, not just the outcome level.
Shashank Padala
Founder, Kirak Labs · Ex-Amazon TPM II · AI Product Leader
AI Product & Transformation Leader with 8+ years of experience building production LLM systems across enterprise platforms and founder-led AI startups. This playbook is a companion to Enterprise GenAI Strategy: From Data to Production and reflects real-world experience shipping agentic systems at scale.