Agentic AIMCPLangGraphEnterpriseOrchestration

Enterprise Agentic AI:
From Chatbot to Autonomous Agent

A Practitioner's Guide to Production-Grade Agentic Workflows

Shashank Padala · Founder, Kirak Labs 22 min readMar 17, 2026Companion to: Enterprise GenAI Strategy

AudienceEngineering LeadersCTOsAI Platform TeamsSenior TPMs

AssumesFamiliarity with RAG systems — this playbook covers what changes when you move beyond them

Executive Summary

Enterprise AI has crossed a threshold. RAG systems are valuable — but fundamentally reactive. A user asks, the system retrieves and answers. The control flow is fixed. The system cannot plan, cannot take action, and cannot adapt based on what it finds.

Agentic AI changes this. An agent is an LLM that operates in a loop: it reasons about a goal, chooses and executes actions using tools, observes the results, and decides what to do next — until the task is complete. The control flow is determined at runtime by the model, not hardcoded by the developer. This is not an incremental improvement over RAG. It is a fundamentally different architectural pattern.

The reference example throughout this playbook is the Padala Bank Credit Analyst Agent — a multi-tool, multi-step agent that automates the assembly of commercial credit approval packages. It captures the full complexity of enterprise agentic deployment: sensitive data, compliance requirements, multiple internal systems, and consequential outputs requiring human oversight.

From RAG to Agents: What Actually Changes

RAG SYSTEM

1User query

2Retrieve context

3Augment prompt

4Generate answer

5Return response

Fixed control flow — developer defines path at build time

AGENTIC SYSTEM

Receive Goal

Reason → Plan

Execute Tool Call

Observe Result

Adapt & Loop / Complete

Dynamic control flow — model decides execution path at runtime

RAG vs Agent: Decision Matrix

Agents are not always the right choice. The additional complexity is only justified when the task genuinely requires it.

Dimension	Use RAG	Use an Agent
Task type	Question answering, summarization	Multi-step workflows, planning, decision-making
Data sources	Single vector index	Multiple heterogeneous systems (DB, API, files)
Actions required	None — read-only	Write operations, calculations, API calls
Control flow	Always the same path	Varies based on what the agent finds
Latency tolerance	Sub-second responses	Minutes acceptable for complex tasks
Failure modes	Wrong answer, hallucination	Tool failure, blast radius, audit trail
Example	Internal knowledge assistant	Credit package assembly, report generation

Reference Example: Padala Bank Credit Analyst Agent

The Problem

The Corporate Credit team at Padala Bank evaluates commercial lending opportunities. For each deal, an analyst must assemble a Credit Approval Package (CAP) — covering the borrower's financials, existing exposure, comparable past deals, risk ratios, and regulatory capital impact. Today, this takes 2–3 days of manual work across Bloomberg terminals, internal SQL databases, a shared drive of past memos, and a Basel III capital calculator spreadsheet.

A RAG chatbot cannot solve this. It cannot pull live Bloomberg data, query the loan book, run calculations, or synthesize a structured memo from 8 different data sources. This is precisely the right problem for an agent.

FUNCTIONAL REQUIREMENTS

Accept a company name or ticker → produce a complete Credit Approval Package
Pull live financials, credit ratings, and sector news from Bloomberg
Query Padala Bank's internal loan book for existing exposure and deal history
Retrieve comparable past credit memos from the internal RAG index
Execute leverage, coverage, and debt/EBITDA calculations against covenant thresholds
Run Basel III regulatory capital analysis when leverage exceeds threshold
Draft the memo in Padala Bank's standard CAP template with citations for every claim
Require explicit analyst approval before logging to the deal management system

NON-FUNCTIONAL REQUIREMENTS

All data must remain within Padala Bank's private VPC — no data egress to public APIs
Every data claim must be traceable to a source (full audit trail for compliance)
Full tool call log: what was called, arguments passed, what was returned, when
Human approval required before any write operation to downstream systems
P95 end-to-end latency under 3 minutes for standard deals
Support for parallel tool execution where data sources are independent

Reference Architecture: Enterprise Agentic System

A production enterprise agent is not a single service. It is five interconnected layers, each with distinct responsibilities.

Interface Layer

Where the analyst interacts with the agent. Can be a web application, Microsoft Teams bot, or direct API.

Must support bidirectional interaction — the analyst submits a goal, watches agent reasoning in real-time (streaming), and interrogates intermediate results before approving the final output. Unlike a chatbot, this surfaces the agent's step-by-step reasoning as it progresses. Essential for building analyst trust in compliance-sensitive environments.

Orchestration Layer (LangGraph)

The brain of the system. Hosts the state graph, manages task state, and coordinates tool calls.

Responsible for the agent loop: Reason → Act → Observe → repeat. The planner node is a Claude call that receives the current state and decides what to do next — genuine LLM reasoning over the task graph, not heuristic logic. Models read current state (what has been retrieved) and the goal, then output a structured action plan.

Tool Layer (MCP Servers)

Every external capability the agent can use is exposed through an MCP server running inside Padala Bank's VPC.

The agent never calls Bloomberg or the SQL server directly — it calls the MCP server, which handles authentication, data formatting, and error handling. One MCP server per system boundary: Bloomberg MCP, SQL Loan Book MCP, RAG Knowledge MCP, Calculator MCP.

Memory Layer

Significantly more complex than in RAG. Four distinct memory types must be managed.

Working Memory (LangGraph state — current task data), Semantic Memory (existing RAG vector index for past memos and policies), Episodic Memory (past agent runs — similar deal patterns), Procedural Memory (system prompt — CAP template, covenant thresholds, Basel rules).

Human-in-the-Loop Gate

Not optional in enterprise agentic systems — where human judgment is inserted before any consequential action.

LangGraph implements this via graph interrupts: execution pauses at a designated node, serializes current state to persistent storage, and waits for a human signal to resume. The analyst reviews the full CAP memo, can ask clarifying questions, request revisions, and only then approves submission to the deal management system.

MCP: The Tool Connectivity Standard

Mental Model: MCP is USB-C for AI Agents

Before USB-C, every device had a proprietary connector. After it, one cable works everywhere. MCP does the same for tool connectivity. Before MCP, connecting Claude to Bloomberg required different logic than connecting the same agent to a SQL database — every pairing was bespoke. MCP solves this with a universal contract based on JSON-RPC 2.0.

MCP Host

The agent process — the client that initiates connections and sends tool call requests. In our case, the LangGraph orchestrator.

MCP Server

A lightweight service that exposes tools. Each server has exactly two responsibilities: declare what tools it exposes, and handle calls to those tools.

Tool

A named, typed function the agent can call. Tools have descriptions (written in plain English for the model to read), typed input schemas, and return typed responses.

How Many MCP Servers Should You Build?

One MCP server per system boundary is the correct pattern — reflecting real security and operational requirements:

MCP Server	System	Tools Exposed	Key Security Control
bloomberg-mcp	Bloomberg Terminal API	get_financials, get_ratings, get_news	API key isolation, VPC-only egress
sql-loanbook-mcp	Internal SQL Database	query_exposure, get_deal_history	Read-only, non-SELECT queries blocked
rag-knowledge-mcp	Vector Index (past memos + policies)	search_past_memos, get_credit_policy	Permission_group filtering at query time
calculator-mcp	Pure Python Functions	calc_leverage, calc_coverage, calc_basel3	No external calls — deterministic outputs

MCP vs CLI: When Each Is Right

The CLI argument is correct for personal agents on a developer's laptop where tools are shell commands. It is the wrong choice when you need: auth isolation, structured data contracts, remote tool access inside a VPC, compliance audit trails, and reusable tool servers across multiple agents. For enterprise deployments like Padala Bank, MCP is not optional — it is the architecture.

Orchestration: LangGraph In Depth

WHY LANGGRAPH FOR ENTERPRISE

Persistent state — Later nodes know what earlier nodes found
Cycles — The graph can loop — essential for retry logic and iterative refinement
Checkpointing — Pause execution, store state, resume after human review
Streaming — Surface agent reasoning to analyst UI in real time
Subgraphs — Compose complex workflows from reusable sub-agents

THE PADALA BANK STATE GRAPH

STARTreceive_goal

→fetch_bloomberg

→fetch_loanbook_exposure

→search_past_memos (RAG)

→calculate_risk_ratios

→?run_basel3_analysis (if leverage > 5.5x)

→draft_cap_memo

⏸HUMAN REVIEW (interrupt)

→log_to_deal_system (on approval)

The Line That Makes This an Agent

After calculating risk ratios, the graph examines the result and decides what to do next. If leverage exceeds 5.5x, it routes to the Basel node. If not, it skips it entirely. The developer did not hardcode this decision — the model's observation of retrieved data drives the execution path. This is the agent loop in practice.

Memory Architecture for Enterprise Agents

Memory in agentic systems is far richer than in RAG. A RAG system has one memory type: the vector index. A production agent must manage four distinct memory types.

Working Memory

LangGraph State (in-memory + PostgreSQL checkpoint)

The CAP State object shared across all nodes. Every node reads from and writes to it. Accumulates throughout the agent run. Checkpointed after every node — agent can be stopped and resumed without losing any retrieved data.

bloomberg_data, exposure_summary, risk_ratios, draft_memo...

Semantic Memory

Vector Index (your existing RAG system)

Past credit memos, credit policies, covenant templates. Exposed via the RAG MCP server. Your existing vector index becomes the semantic memory of your agents — the RAG investment does not go away, it becomes more valuable.

search_past_memos('leveraged buyout tech sector'), get_credit_policy('concentration_risk')

Episodic Memory

Agent run history (PostgreSQL / LangSmith)

Records of past agent runs. Which tool sequences produced high-quality memos? What failure patterns have occurred? Enables the system to learn from experience over time.

Past deal runs with analyst acceptance rates, revision patterns, escalation reasons

Procedural Memory

System prompt + prompt versioning

How to do the task: the CAP template structure, covenant threshold definitions, Basel III calculation rules, citation format requirements. Injected at the start of every run.

CAP template, leverage thresholds, Basel III capital consumption formula

Human-in-the-Loop Design

Why HITL Is Non-Negotiable in Enterprise

Enterprise agents operate in systems with real consequences. A mistaken database write, an incorrect credit recommendation logged to a deal management system, or a compliance flag missed by the agent — these are not bugs to patch. They are business failures with regulatory and financial consequences. HITL is not a concession to distrust of AI. It is a risk management architecture.

Review & Approve

When: Agent completes full task draft, then interrupts for human decision before any write operation

Padala Bank use: Padala Bank primary pattern — analyst reviews full CAP before deal system submission

Exception Escalation

When: Agent escalates to human when it encounters anomalies or hard-stop conditions mid-run

Padala Bank use: Padala Bank: if leverage > 7.0x OR Bloomberg data missing → escalate before drafting

Streaming Review

When: Human watches agent reasoning in real time and can intervene at any step

Padala Bank use: For complex or novel deals where analyst wants to guide the research direction

Evaluation & Observability for Agents

Agent evaluation is fundamentally harder than RAG evaluation because agents are non-deterministic and multi-step. The same goal can produce different tool call sequences and different final outputs on different runs. You are not evaluating a single answer — you are evaluating an execution trace.

Step-Level Evaluation

Evaluate each individual tool call and node output in isolation.

Tool call correctness — did the agent call the right tool with the right arguments?
Tool return quality — did the MCP server return valid, complete data?
Node output validity — does the node output match expected schema?

Trace-Level Evaluation

Evaluate the full execution trace for a single agent run.

Tool call sequence — did the agent take the right path? (Did Basel fire when leverage > 5.5x?)
Conditional routing accuracy — did conditional edges fire correctly?
Latency per node — where are bottlenecks?

Outcome-Level Evaluation

Evaluate the final output against ground truth and business requirements.

Memo completeness — all required sections present?
Financial accuracy — do reported figures match source data?
Citation coverage — every claim has a traceable source?
Analyst acceptance rate — approved without major revision?

Foundation — Build the Tool Layer

Weeks 1–8

Build and validate the MCP server layer before writing a single line of agent code.

⚠ The single biggest mistake: building the orchestration layer before the tool layer is stable. Tools that return malformed data or timeout inconsistently will make the agent appear to hallucinate — when the real problem is the tool.

1.Build the RAG MCP Server — wrap your existing vector index (lowest-effort, highest-value first)
2.Build the SQL MCP Server — read-only access with audit logging from day one
3.Build the Calculator MCP Server — pure Python functions for leverage, coverage, Basel III
4.Write a tool test suite — automated tests for every tool
5.Deploy as containerized HTTP+SSE services in dev VPC. Test with a simple MCP client before connecting an agent.

Phase 2

Single Agent Pilot

Weeks 9–16

Ship the credit analyst agent to a pilot group of 10 analysts. Full HITL. Every run reviewed.

1.Implement the LangGraph state graph with all nodes and conditional edges
2.Connect MCP servers using the agent config pattern
3.Build the analyst review UI with streaming, flag surfacing, and approval workflow
4.Instrument full tracing via LangSmith or equivalent
5.Run 20 test deals against historical data before pilot launch
6.Launch to 10 analysts. Target: >85% analyst acceptance rate by week 16.

Phase 3

Multi-Agent Orchestration

Weeks 17–28

Expand from one agent to an agent catalog. Multiple agents share the same MCP server layer.

⚠ Phase 3 is where the MCP investment pays its first compounding dividend. The SQL, RAG, and Calculator MCP servers built in Phase 1 can now be reused by a risk monitoring agent, a portfolio review agent, and a regulatory reporting agent — with zero incremental tool-layer work.

1.Risk Monitoring Agent — daily scan of loan portfolio for covenant breaches
2.Portfolio Review Agent — weekly portfolio risk aggregation and reporting
3.Regulatory Reporting Agent — automated Basel III capital consumption reports
4.Agent Router — classify incoming tasks and dispatch to the right specialist agent

Phase 4

Production Operations

Ongoing

Operate agents at scale with production SLAs, cost controls, and continuous evaluation.

1.Evaluation pipeline running weekly against golden dataset with automated regression alerts
2.Cost dashboard: LLM spend per agent per run with anomaly detection
3.SLA monitoring: P95 latency, tool availability, agent success rate
4.Prompt versioning: every change to system prompts is versioned and A/B tested before rollout
5.Quarterly red team exercises: security team attempts prompt injection and data exfiltration via agent tools

Team Structure & Build vs Buy

Team Roles (Phase 2+)

Role	Responsibilities
AI Platform Engineer (2)	LangGraph orchestration, agent architecture, framework ownership
MCP Server Engineers (2)	Tool layer: one per high-complexity integration (Bloomberg)
ML/Eval Engineer (1)	Evaluation pipeline, golden dataset, tracing infrastructure
Product Manager (1)	Analyst workflow, pilot coordination, success metrics
Security / Compliance (0.5)	Audit trail design, RBAC, data residency, red team coordination

Build vs Buy Decision Framework

Data sensitivity & regulatory requirements

✓ Regulated data + strict sovereignty requirements → build

◎ Standard enterprise data, no special compliance → buy may work

Workflow customization

✓ Bespoke conditional logic, custom HITL flows → build

◎ Standard workflows that fit platform templates → buy

Engineering team capability

✓ Strong AI/ML engineering team → build

◎ Limited ML expertise → managed platform reduces overhead

Vendor lock-in tolerance

✓ Low tolerance → build on open-source (LangGraph + MCP SDK)

◎ Acceptable → Azure AI Foundry, Salesforce Agentforce

For Padala Bank: regulated data, strong engineering, bespoke workflows, low vendor lock-in tolerance → build on LangGraph + MCP SDK.

From Reactive to Autonomous

The shift from RAG to agents is not primarily a model upgrade — it is an architectural and organizational shift. The locus of control moves from your code to the model. The failure modes change from wrong answers to wrong actions. The stakes are higher, the rewards are greater, and the engineering discipline required is different.

Build the tool layer before the agent layer. Design HITL into the architecture from day one, not as an afterthought. Make the audit trail richer than the output. And measure at the trace level, not just the outcome level.

Build the MCP tool layer first — before writing agent code

Design human-in-the-loop from day one, not as an afterthought

Make the audit trail richer than the output

Measure at trace level, not just outcome level

Reuse MCP servers across multiple agents — that's the compounding value

Non-determinism is a property, not a bug — manage it with tracing

Shashank Padala

Founder, Kirak Labs · Ex-Amazon · AI Product Leader

AI Product & Transformation Leader with 8+ years of experience building production LLM systems across enterprise platforms and founder-led AI startups. This playbook is a companion to Enterprise GenAI Strategy: From Data to Production and reflects real-world experience shipping agentic systems at scale.

LinkedIn →Ask My AI Brain →

Enterprise Agentic AI:From Chatbot to Autonomous Agent