Agentic AIMCPLangGraphEnterpriseOrchestration

Enterprise Agentic AI:
From Chatbot to Autonomous Agent

A Practitioner's Guide to Production-Grade Agentic Workflows

Shashank Padala · Founder, Kirak Labs 22 min readMar 17, 2026Companion to: Enterprise GenAI Strategy
Audience: Engineering Leaders · CTOs · AI Platform Teams · Senior TPMs|Assumes: Familiarity with RAG systems — this playbook covers what changes when you move beyond them

Executive Summary

Enterprise AI has crossed a threshold. RAG systems are valuable — but fundamentally reactive. A user asks, the system retrieves and answers. The control flow is fixed. The system cannot plan, cannot take action, and cannot adapt based on what it finds.

Agentic AI changes this. An agent is an LLM that operates in a loop: it reasons about a goal, chooses and executes actions using tools, observes the results, and decides what to do next — until the task is complete. The control flow is determined at runtime by the model, not hardcoded by the developer. This is not an incremental improvement over RAG. It is a fundamentally different architectural pattern.

The reference example throughout this playbook is the Padala Bank Credit Analyst Agent — a multi-tool, multi-step agent that automates the assembly of commercial credit approval packages. It captures the full complexity of enterprise agentic deployment: sensitive data, compliance requirements, multiple internal systems, and consequential outputs requiring human oversight.

01

From RAG to Agents: What Actually Changes

RAG SYSTEM

1User query
2Retrieve context
3Augment prompt
4Generate answer
5Return response

Fixed control flow — developer defines path at build time

AGENTIC SYSTEM

Receive Goal
Reason → Plan
Execute Tool Call
Observe Result
Adapt & Loop / Complete

Dynamic control flow — model decides execution path at runtime

RAG vs Agent: Decision Matrix

Agents are not always the right choice. The additional complexity is only justified when the task genuinely requires it.

DimensionUse RAGUse an Agent
Task typeQuestion answering, summarizationMulti-step workflows, planning, decision-making
Data sourcesSingle vector indexMultiple heterogeneous systems (DB, API, files)
Actions requiredNone — read-onlyWrite operations, calculations, API calls
Control flowAlways the same pathVaries based on what the agent finds
Latency toleranceSub-second responsesMinutes acceptable for complex tasks
Failure modesWrong answer, hallucinationTool failure, blast radius, audit trail
ExampleInternal knowledge assistantCredit package assembly, report generation
02

Reference Example: Padala Bank Credit Analyst Agent

The Problem

The Corporate Credit team at Padala Bank evaluates commercial lending opportunities. For each deal, an analyst must assemble a Credit Approval Package (CAP) — covering the borrower's financials, existing exposure, comparable past deals, risk ratios, and regulatory capital impact. Today, this takes 2–3 days of manual work across Bloomberg terminals, internal SQL databases, a shared drive of past memos, and a Basel III capital calculator spreadsheet.

A RAG chatbot cannot solve this. It cannot pull live Bloomberg data, query the loan book, run calculations, or synthesize a structured memo from 8 different data sources. This is precisely the right problem for an agent.

FUNCTIONAL REQUIREMENTS

  • Accept a company name or ticker → produce a complete Credit Approval Package
  • Pull live financials, credit ratings, and sector news from Bloomberg
  • Query Padala Bank's internal loan book for existing exposure and deal history
  • Retrieve comparable past credit memos from the internal RAG index
  • Execute leverage, coverage, and debt/EBITDA calculations against covenant thresholds
  • Run Basel III regulatory capital analysis when leverage exceeds threshold
  • Draft the memo in Padala Bank's standard CAP template with citations for every claim
  • Require explicit analyst approval before logging to the deal management system

NON-FUNCTIONAL REQUIREMENTS

  • All data must remain within Padala Bank's private VPC — no data egress to public APIs
  • Every data claim must be traceable to a source (full audit trail for compliance)
  • Full tool call log: what was called, arguments passed, what was returned, when
  • Human approval required before any write operation to downstream systems
  • P95 end-to-end latency under 3 minutes for standard deals
  • Support for parallel tool execution where data sources are independent
03

Reference Architecture: Enterprise Agentic System

A production enterprise agent is not a single service. It is five interconnected layers, each with distinct responsibilities.

L1

Interface Layer

Where the analyst interacts with the agent. Can be a web application, Microsoft Teams bot, or direct API.

Must support bidirectional interaction — the analyst submits a goal, watches agent reasoning in real-time (streaming), and interrogates intermediate results before approving the final output. Unlike a chatbot, this surfaces the agent's step-by-step reasoning as it progresses. Essential for building analyst trust in compliance-sensitive environments.

L2

Orchestration Layer (LangGraph)

The brain of the system. Hosts the state graph, manages task state, and coordinates tool calls.

Responsible for the agent loop: Reason → Act → Observe → repeat. The planner node is a Claude call that receives the current state and decides what to do next — genuine LLM reasoning over the task graph, not heuristic logic. Models read current state (what has been retrieved) and the goal, then output a structured action plan.

L3

Tool Layer (MCP Servers)

Every external capability the agent can use is exposed through an MCP server running inside Padala Bank's VPC.

The agent never calls Bloomberg or the SQL server directly — it calls the MCP server, which handles authentication, data formatting, and error handling. One MCP server per system boundary: Bloomberg MCP, SQL Loan Book MCP, RAG Knowledge MCP, Calculator MCP.

L4

Memory Layer

Significantly more complex than in RAG. Four distinct memory types must be managed.

Working Memory (LangGraph state — current task data), Semantic Memory (existing RAG vector index for past memos and policies), Episodic Memory (past agent runs — similar deal patterns), Procedural Memory (system prompt — CAP template, covenant thresholds, Basel rules).

L5

Human-in-the-Loop Gate

Not optional in enterprise agentic systems — where human judgment is inserted before any consequential action.

LangGraph implements this via graph interrupts: execution pauses at a designated node, serializes current state to persistent storage, and waits for a human signal to resume. The analyst reviews the full CAP memo, can ask clarifying questions, request revisions, and only then approves submission to the deal management system.

04

MCP: The Tool Connectivity Standard

Mental Model: MCP is USB-C for AI Agents
Before USB-C, every device had a proprietary connector. After it, one cable works everywhere. MCP does the same for tool connectivity. Before MCP, connecting Claude to Bloomberg required different logic than connecting the same agent to a SQL database — every pairing was bespoke. MCP solves this with a universal contract based on JSON-RPC 2.0.

MCP Host

The agent process — the client that initiates connections and sends tool call requests. In our case, the LangGraph orchestrator.

MCP Server

A lightweight service that exposes tools. Each server has exactly two responsibilities: declare what tools it exposes, and handle calls to those tools.

Tool

A named, typed function the agent can call. Tools have descriptions (written in plain English for the model to read), typed input schemas, and return typed responses.

How Many MCP Servers Should You Build?

One MCP server per system boundary is the correct pattern — reflecting real security and operational requirements:

MCP ServerSystemTools ExposedKey Security Control
bloomberg-mcpBloomberg Terminal APIget_financials, get_ratings, get_newsAPI key isolation, VPC-only egress
sql-loanbook-mcpInternal SQL Databasequery_exposure, get_deal_historyRead-only, non-SELECT queries blocked
rag-knowledge-mcpVector Index (past memos + policies)search_past_memos, get_credit_policyPermission_group filtering at query time
calculator-mcpPure Python Functionscalc_leverage, calc_coverage, calc_basel3No external calls — deterministic outputs
MCP vs CLI: When Each Is Right
The CLI argument is correct for personal agents on a developer's laptop where tools are shell commands. It is the wrong choice when you need: auth isolation, structured data contracts, remote tool access inside a VPC, compliance audit trails, and reusable tool servers across multiple agents. For enterprise deployments like Padala Bank, MCP is not optional — it is the architecture.
05

Orchestration: LangGraph In Depth

WHY LANGGRAPH FOR ENTERPRISE

  • Persistent stateLater nodes know what earlier nodes found
  • CyclesThe graph can loop — essential for retry logic and iterative refinement
  • CheckpointingPause execution, store state, resume after human review
  • StreamingSurface agent reasoning to analyst UI in real time
  • SubgraphsCompose complex workflows from reusable sub-agents

THE PADALA BANK STATE GRAPH

STARTreceive_goal
fetch_bloomberg
fetch_loanbook_exposure
search_past_memos (RAG)
calculate_risk_ratios
→?run_basel3_analysis (if leverage > 5.5x)
draft_cap_memo
HUMAN REVIEW (interrupt)
log_to_deal_system (on approval)
The Line That Makes This an Agent
After calculating risk ratios, the graph examines the result and decides what to do next. If leverage exceeds 5.5x, it routes to the Basel node. If not, it skips it entirely. The developer did not hardcode this decision — the model's observation of retrieved data drives the execution path. This is the agent loop in practice.
06

Memory Architecture for Enterprise Agents

Memory in agentic systems is far richer than in RAG. A RAG system has one memory type: the vector index. A production agent must manage four distinct memory types.

Working Memory

LangGraph State (in-memory + PostgreSQL checkpoint)

The CAP State object shared across all nodes. Every node reads from and writes to it. Accumulates throughout the agent run. Checkpointed after every node — agent can be stopped and resumed without losing any retrieved data.

bloomberg_data, exposure_summary, risk_ratios, draft_memo...

Semantic Memory

Vector Index (your existing RAG system)

Past credit memos, credit policies, covenant templates. Exposed via the RAG MCP server. Your existing vector index becomes the semantic memory of your agents — the RAG investment does not go away, it becomes more valuable.

search_past_memos('leveraged buyout tech sector'), get_credit_policy('concentration_risk')

Episodic Memory

Agent run history (PostgreSQL / LangSmith)

Records of past agent runs. Which tool sequences produced high-quality memos? What failure patterns have occurred? Enables the system to learn from experience over time.

Past deal runs with analyst acceptance rates, revision patterns, escalation reasons

Procedural Memory

System prompt + prompt versioning

How to do the task: the CAP template structure, covenant threshold definitions, Basel III calculation rules, citation format requirements. Injected at the start of every run.

CAP template, leverage thresholds, Basel III capital consumption formula
07

Human-in-the-Loop Design

Why HITL Is Non-Negotiable in Enterprise
Enterprise agents operate in systems with real consequences. A mistaken database write, an incorrect credit recommendation logged to a deal management system, or a compliance flag missed by the agent — these are not bugs to patch. They are business failures with regulatory and financial consequences. HITL is not a concession to distrust of AI. It is a risk management architecture.

Review & Approve

When: Agent completes full task draft, then interrupts for human decision before any write operation

Padala Bank use: Padala Bank primary pattern — analyst reviews full CAP before deal system submission

Exception Escalation

When: Agent escalates to human when it encounters anomalies or hard-stop conditions mid-run

Padala Bank use: Padala Bank: if leverage > 7.0x OR Bloomberg data missing → escalate before drafting

Streaming Review

When: Human watches agent reasoning in real time and can intervene at any step

Padala Bank use: For complex or novel deals where analyst wants to guide the research direction

08

Evaluation & Observability for Agents

Agent evaluation is fundamentally harder than RAG evaluation because agents are non-deterministic and multi-step. The same goal can produce different tool call sequences and different final outputs on different runs. You are not evaluating a single answer — you are evaluating an execution trace.

Step-Level Evaluation

Evaluate each individual tool call and node output in isolation.

  • Tool call correctness — did the agent call the right tool with the right arguments?
  • Tool return quality — did the MCP server return valid, complete data?
  • Node output validity — does the node output match expected schema?

Trace-Level Evaluation

Evaluate the full execution trace for a single agent run.

  • Tool call sequence — did the agent take the right path? (Did Basel fire when leverage > 5.5x?)
  • Conditional routing accuracy — did conditional edges fire correctly?
  • Latency per node — where are bottlenecks?

Outcome-Level Evaluation

Evaluate the final output against ground truth and business requirements.

  • Memo completeness — all required sections present?
  • Financial accuracy — do reported figures match source data?
  • Citation coverage — every claim has a traceable source?
  • Analyst acceptance rate — approved without major revision?
Tracing: The Foundational Observability Primitive
For agents, tracing is not optional. A trace records every step: which nodes executed, in what order, what tools were called with what arguments, what each tool returned, how long each step took, and the final output. Without traces, debugging a failed agent run is guesswork. LangSmith integrates natively with LangGraph and captures full execution traces automatically — enabling it is a one-line configuration.
09

The Hardest Problems in Enterprise Agents

Non-Determinism & Auditability

THE PROBLEM

The same goal submitted twice may produce different tool call sequences and different final outputs. This is not a bug — it is a property of LLM reasoning. But it creates fundamental challenges for compliance teams expecting consistent, reproducible outputs.

THE SOLUTION

Make the audit trail richer than the output. Every tool call is logged with its exact arguments and return values. The analyst approval is recorded with timestamp and notes. The final memo is stored with its full provenance chain. Even if two runs produce slightly different memos, compliance can trace exactly how each was produced.

Tool Failure & Blast Radius

THE PROBLEM

When a tool call fails — Bloomberg API is down, SQL query times out, the RAG index returns empty results — the agent must handle it gracefully. Without explicit error handling, the agent will either hallucinate data or produce an incomplete output without surfacing the failure.

THE SOLUTION

Wrap every tool call in a retry-with-fallback handler. If a tool fails after retries, return a structured error state rather than raising an exception. Check status before using result — if the tool failed, set leverage_flagged conservatively and escalate to human review rather than proceeding with incomplete data.

Prompt Injection & Tool Misuse

THE PROBLEM

Malicious content in retrieved data (e.g. a Bloomberg news article or past credit memo) could attempt to override the agent's instructions. An MCP server returning crafted responses could attempt to redirect tool calls. These are real attack surfaces in production.

THE SOLUTION

Treat all tool outputs as untrusted data — never inject them directly into the system prompt context without sanitization. Safety checks live inside the MCP server, not in the agent. The server is the last line of defence for its system boundary.

Cost & Latency at Scale

THE PROBLEM

Multi-step agents make multiple LLM calls per task. A single Padala Bank deal evaluation makes approximately 8–12 LLM calls. At claude-sonnet rates, that is $0.15–0.40 per deal. At 500 deals/day, caching, parallelization, and model tiering become critical architectural decisions.

THE SOLUTION

Run independent data source calls in parallel (Bloomberg + SQL + RAG can be fetched simultaneously). Use model tiering: smaller/faster models for routing and planning nodes; full models for draft and synthesis. Implement semantic caching for repeated queries against static data.

10

Phased Rollout Plan

Phase 1

Foundation — Build the Tool Layer

Weeks 1–8

Build and validate the MCP server layer before writing a single line of agent code.

The single biggest mistake: building the orchestration layer before the tool layer is stable. Tools that return malformed data or timeout inconsistently will make the agent appear to hallucinate — when the real problem is the tool.
  • 1.Build the RAG MCP Server — wrap your existing vector index (lowest-effort, highest-value first)
  • 2.Build the SQL MCP Server — read-only access with audit logging from day one
  • 3.Build the Calculator MCP Server — pure Python functions for leverage, coverage, Basel III
  • 4.Write a tool test suite — automated tests for every tool
  • 5.Deploy as containerized HTTP+SSE services in dev VPC. Test with a simple MCP client before connecting an agent.
Phase 2

Single Agent Pilot

Weeks 9–16

Ship the credit analyst agent to a pilot group of 10 analysts. Full HITL. Every run reviewed.

  • 1.Implement the LangGraph state graph with all nodes and conditional edges
  • 2.Connect MCP servers using the agent config pattern
  • 3.Build the analyst review UI with streaming, flag surfacing, and approval workflow
  • 4.Instrument full tracing via LangSmith or equivalent
  • 5.Run 20 test deals against historical data before pilot launch
  • 6.Launch to 10 analysts. Target: >85% analyst acceptance rate by week 16.
Phase 3

Multi-Agent Orchestration

Weeks 17–28

Expand from one agent to an agent catalog. Multiple agents share the same MCP server layer.

Phase 3 is where the MCP investment pays its first compounding dividend. The SQL, RAG, and Calculator MCP servers built in Phase 1 can now be reused by a risk monitoring agent, a portfolio review agent, and a regulatory reporting agent — with zero incremental tool-layer work.
  • 1.Risk Monitoring Agent — daily scan of loan portfolio for covenant breaches
  • 2.Portfolio Review Agent — weekly portfolio risk aggregation and reporting
  • 3.Regulatory Reporting Agent — automated Basel III capital consumption reports
  • 4.Agent Router — classify incoming tasks and dispatch to the right specialist agent
Phase 4

Production Operations

Ongoing

Operate agents at scale with production SLAs, cost controls, and continuous evaluation.

  • 1.Evaluation pipeline running weekly against golden dataset with automated regression alerts
  • 2.Cost dashboard: LLM spend per agent per run with anomaly detection
  • 3.SLA monitoring: P95 latency, tool availability, agent success rate
  • 4.Prompt versioning: every change to system prompts is versioned and A/B tested before rollout
  • 5.Quarterly red team exercises: security team attempts prompt injection and data exfiltration via agent tools
11

Team Structure & Build vs Buy

Team Roles (Phase 2+)

RoleResponsibilities
AI Platform Engineer (2)LangGraph orchestration, agent architecture, framework ownership
MCP Server Engineers (2)Tool layer: one per high-complexity integration (Bloomberg)
ML/Eval Engineer (1)Evaluation pipeline, golden dataset, tracing infrastructure
Product Manager (1)Analyst workflow, pilot coordination, success metrics
Security / Compliance (0.5)Audit trail design, RBAC, data residency, red team coordination

Build vs Buy Decision Framework

Data sensitivity & regulatory requirements

Regulated data + strict sovereignty requirements → build
Standard enterprise data, no special compliance → buy may work

Workflow customization

Bespoke conditional logic, custom HITL flows → build
Standard workflows that fit platform templates → buy

Engineering team capability

Strong AI/ML engineering team → build
Limited ML expertise → managed platform reduces overhead

Vendor lock-in tolerance

Low tolerance → build on open-source (LangGraph + MCP SDK)
Acceptable → Azure AI Foundry, Salesforce Agentforce
For Padala Bank: regulated data, strong engineering, bespoke workflows, low vendor lock-in tolerance → build on LangGraph + MCP SDK.

From Reactive to Autonomous

The shift from RAG to agents is not primarily a model upgrade — it is an architectural and organizational shift. The locus of control moves from your code to the model. The failure modes change from wrong answers to wrong actions. The stakes are higher, the rewards are greater, and the engineering discipline required is different.

Build the tool layer before the agent layer. Design HITL into the architecture from day one, not as an afterthought. Make the audit trail richer than the output. And measure at the trace level, not just the outcome level.

Build the MCP tool layer first — before writing agent code
Design human-in-the-loop from day one, not as an afterthought
Make the audit trail richer than the output
Measure at trace level, not just outcome level
Reuse MCP servers across multiple agents — that's the compounding value
Non-determinism is a property, not a bug — manage it with tracing
S

Shashank Padala

Founder, Kirak Labs · Ex-Amazon TPM II · AI Product Leader

AI Product & Transformation Leader with 8+ years of experience building production LLM systems across enterprise platforms and founder-led AI startups. This playbook is a companion to Enterprise GenAI Strategy: From Data to Production and reflects real-world experience shipping agentic systems at scale.

SYSTEM ONLINE
RAG Pipeline Active
Vector DB Connected
Guardrails Enabled