agentic AI in banking
|

How I Deployed Agentic AI at a Major Bank – What Actually Worked (and What Didn’t)

I didn’t read about this in a whitepaper. I lived it.

Most content you’ll find on agentic AI in banking is written by consultants who sat across the table from the people doing the actual work. It’s polished, it’s safe, and it tells you almost nothing useful. This post is different – because I was the one in the room.

I’m Meenakshi Thanikachalam. I serve as the Chief Data and AI Officer at a $75 billion financial institution, reporting directly to the EVP and Group CIO. My mandate isn’t to run experiments. My mandate is to transform the enterprise into an autonomous, intelligence-driven bank – at scale, inside one of the most regulated industries on the planet.

That context matters. Because deploying agentic AI in a regulated financial institution is nothing like deploying it at a SaaS startup. There are compliance walls. There are model risk governance frameworks. There are audit requirements that force you to rethink how an AI agent even makes a decision. And there are real people – risk officers, legal teams, business unit heads – whose sign-off you need before a single agent touches a production workflow.

This post covers exactly what happened when we moved from talking about agentic AI to actually shipping it:

  • How we scoped the first agent – the discovery process, the decision mapping, the data audit
  • How we governed it – compliance requirements, explainability mandates, the audit trail architecture
  • How we shipped it – the tech stack, the integration work, the timeline
  • What the results actually were – the numbers, the wins, and the one rollback that taught us more than any success did

By the end of this post, you will know exactly how agentic AI gets scoped, governed, and shipped inside a regulated financial institution – not in theory, but from someone who did it.

If you are an AI leader, a data officer, or a technologist trying to move your first real AI agent deployment in banking past the pilot stage and into production, this is the post I wish had existed when I started.

Let’s get into it.

Table of Contents


  • Loading…

What "Agentic AI" Actually Means Inside a Bank

Before we built anything, we had to agree on a definition. Inside a regulated institution, that word carries legal weight.

When I first brought the concept of agentic AI in banking to our internal risk and compliance team, the conversation stalled immediately - not because the technology was unfamiliar, but because the word "agent" had no agreed-upon meaning inside the institution.

That ambiguity is more dangerous than most technology leaders realize. In a regulated financial environment, what you call something determines how it gets governed, audited, and approved. If your definition is vague, your governance framework will be vague - and a vague governance framework does not survive a regulatory review.

So before we wrote a single line of code, we wrote a definition.

An AI agent is any system that can take autonomous action across multiple steps without requiring human approval at each step - where those actions produce real-world outcomes inside a live operational environment.

The most important line we drew early was the distinction between AI agents and AI automation. These are not the same thing - and in banking, treating them as the same is a governance failure waiting to happen.

Traditional Automation

Executes a predefined script. Every action is fixed at design time. It does exactly what it was told to do - no more, no less. The risk is known and bounded.

Agentic AI

Makes contextual decisions at runtime. Adapts based on new inputs. Can initiate downstream actions - flagging, drafting, routing - without a human trigger at each step. The risk surface is dynamic.

That distinction changed everything - not just for engineering, but for legal, compliance, and audit. An automation project follows one approval path. A deployment of AI agents in financial services requires a different risk classification, a different explainability standard, and a different oversight model. Once our risk team understood the distinction, the governance conversation became much more productive.

The 3-Criteria Internal Framework

To prevent ambiguity from creeping back in at the project level, we built a three-criteria framework. Every AI initiative we evaluated had to meet all three criteria to be classified as agentic AI - and therefore enter the agentic AI governance track. If it failed even one criterion, it was reclassified as automation and handled separately.

01

Perceive Context

The system must read and interpret situational context at runtime - not just process structured data fields. It must understand what is happening, not just what the record says.

02

Plan Across Steps

The system must sequence multiple actions toward a goal before executing any of them. Single-step decisions are automation. Multi-step planning with contextual sequencing is agentic behavior.

03

Initiate Real-World Action

The system must be capable of triggering an outcome in a live environment - updating a state, routing a case, generating a communication - without a human initiating each step.

This framework did something important beyond classification - it forced every project team to think precisely about what their system was actually doing. Vague AI projects rarely survive precise definitional criteria. The ones that did were the ones worth building.

Why This Definition Changed Our Entire Governance Approach

Once we had a precise definition and a classification framework, three things changed immediately.

First, the risk conversation shifted from "is this safe?" to "what is the authority boundary?" That is a far more productive question inside a bank. It is specific, it is auditable, and it produces a clear design requirement rather than an open-ended debate.

Second, the compliance team could engage constructively from Day 1. When compliance understands that an AI agent has a defined scope - what it can read, what it can draft, what it can never execute - they can write a governance policy. When the definition is vague, compliance defaults to "no." A precise definition removes the most common blocker to deploying AI in regulated industries.

Third, it established Meenakshi Thanikachalam's team as the internal authority on agentic AI governance - not just the delivery team, but the team that defined the rules. That positioning mattered throughout the entire 18-month deployment. When definition questions arose - and they did, repeatedly - there was one place the institution looked for answers.

The principle I took from this: In banking, the definition of a technology is not a semantic exercise. It is a governance document. Get it wrong, and everything downstream - the risk framework, the audit trail, the approval process - is built on an unstable foundation. Get it right, and LLM deployment in enterprise banking becomes a structured engineering problem rather than an open compliance risk.

How We Scoped the First Use Case

The first decision we made about deploying agentic AI in banking had nothing to do with technology. It had everything to do with the problem we were trying to solve.

That distinction sounds obvious. In practice, most enterprise AI programs get it backwards. A vendor demonstrates a capability, leadership gets excited, and the team spends six months trying to find a workflow that fits the demo. We did not do that.

We started by looking at where human time was being consumed at the highest volume and where the cost of a wrong decision was measurable. That led us directly to credit operations - specifically, the fraud review workflow.

"The question is never 'Can an AI agent do this?' The right question is: 'Which parts of this workflow should an AI agent do - and where must a human remain accountable?'"

- Meenakshi Thanikachalam, Chief Data and AI Officer

The Baseline Problem

Before a single line of agentic code was written, we documented the workflow in full. An analyst flagged for potential fraud had to pull context from five separate systems - transaction history, customer profile, prior dispute records, velocity data, and network signals - synthesize that information, make a judgment call, and either escalate or resolve the case.

Average handling time: 14 minutes per case. We were processing thousands of flags per day. The volume was not sustainable, and the quality of decisions was inconsistent because analysts had variable access to context depending on which systems they prioritized.

14 min Average analyst handling time per flag
5 Systems pulled manually per case
1,000s Flags processed every day

The Core Scoping Question

Deploying AI agents in financial services is not a binary decision. You do not hand a workflow to an agent and walk away. You map every step, every decision point, and every downstream consequence - and then you draw a line between what the agent should do and where a human must remain accountable.

That line is not a technical boundary. It is a regulatory one. In banking, the accountability question determines your audit posture, your compliance exposure, and your model risk rating. We mapped the entire workflow against two criteria: Can an agent do this reliably? And Does a human need to own this outcome?

Here is the table that came out of that scoping exercise. This became the template we used for every subsequent AI agent deployment in banking we scoped after this one.

Workflow Step Agent Capable Human Required Reason
Pull data from 5 systems simultaneously Yes No Deterministic retrieval - fully auditable, no judgment required
Identify pattern against prior fraud cases Yes No Model decision - logged, explainable, and reproducible
Draft resolution recommendation Yes No Drafted only - not executed; analyst reviews before action
Final approval and case action No Yes Regulatory accountability - human sign-off required by policy
Customer communication Partial Yes Agent drafts; human reviews for tone, edge cases, and empathy

This table became our agentic AI scoping policy. Every new use case had to fill it out before engineering started. If the team could not answer "Human Required" with a clear regulatory or quality reason, the step went to the agent. If they could, the human stayed in the loop - by design, not by accident.

The Governance Framework We Built from Zero

This is the section that most articles on AI automation in banking operations skip entirely. It is also the section that determines whether your deployment survives its first compliance audit - or quietly becomes a liability.

When we began this program in 2023, there was no established governance standard for agentic AI in a regulated financial institution. The OCC's model risk guidance was built for traditional ML models. Our internal AI policy covered algorithmic decision-making. Neither was designed for systems that plan across multiple steps, take autonomous actions, and operate at scale without human approval at each stage.

We had to build the framework ourselves.

The Three Source Frameworks We Combined

Rather than starting from a blank page, we identified the three existing frameworks that came closest to what we needed - and then documented every gap explicitly.

1. Internal AI Model Risk Policy

Covered model development, validation, and monitoring for traditional ML. We adapted its explainability and audit requirements for multi-step agent decision chains.

2. Enterprise Data Access Governance Layer

Defined who could access what data and under what conditions. We extended this to define what an agent could access, at what query frequency, and with what audit trail.

3. OCC 2021 Model Risk Management Guidance

Provided the regulatory baseline for model accountability and independent review. We mapped every agentic component to a corresponding OCC requirement - and wrote new policy where no mapping existed.

We spent four months on governance design before a single line of agentic code was written. That timeline is not overhead. It is what makes everything downstream defensible in a regulated environment.

The 5 Components of Our Agentic AI Governance Policy

What follows is the exact policy structure we built for deploying AI in regulated industries. Every agent in our production environment operates under all five components simultaneously. None of them are optional.

Component 01

Agent Registry

Every agent deployed in production is registered with a unique ID, a named owner, a defined scope of authority, data access permissions, and audit trail requirements. No agent operates in production without a registry entry. This is the foundation of all other governance components.

Component 02

Authority Boundaries

Every agent has a defined and enforced scope of action. It can read, recommend, and draft. It cannot execute financial transactions, update customer records, or initiate external communications without an explicit human trigger. Authority limits are set at the registry level and enforced technically - not just by policy.

Component 03

Explainability Requirements

Every decision an agent makes that influences a human outcome must produce a human-readable explanation. We implemented chain-of-thought logging at the model layer. Compliance does not accept a confidence score as an explanation. They need a narrative - and we built the translation layer to produce one.

Component 04

Escalation Protocol

Any scenario the agent encounters that falls outside its training distribution triggers an automatic human escalation. The agent does not guess in edge cases. It stops, flags the case, and hands it to a human with a summary of what it observed. Silence is not an acceptable agent behavior.

Component 05

Quarterly Audit

Every registered agent is reviewed quarterly against its original defined scope. Scope drift - where an agent gradually begins operating outside the boundaries it was designed for - is the highest-risk failure mode in long-running agentic deployments. The quarterly audit exists specifically to catch it before it becomes a compliance event.

The Principle Behind the Framework

Governance is not a constraint on agentic AI. It is the foundation of it. Every bank that has tried to bolt governance onto a live agentic deployment has paid for that sequencing in rework, audit findings, or both. Build the framework first. The technology becomes the easy part.

The Three Failures That Almost Killed the Project

The section vendor case studies never include

I am including this section because it is the most useful part of this article. If you are planning an agentic AI in banking deployment, the three failures below will save you months. None of them appear in vendor decks. All of them are real.

Failure 01

Context Window Limitations in Multi-Step Tasks

Our first production agent worked cleanly in a controlled test environment. In live operations, it failed on complex fraud review cases - and it took us two weeks to understand why.

The problem was context retention. A multi-system retrieval workflow requires the AI agent in financial services to hold context from Step 1 while executing Step 6. Early LLM architectures handled this inconsistently. In low-complexity cases, the agent performed well. In cases that required pulling data from five systems across a 12-step workflow, it was losing critical context mid-process - and making incomplete recommendations as a result.

How we fixed it: We redesigned the agent architecture into bounded sub-agents. Each sub-agent owned one clearly scoped step in the workflow. Context was passed explicitly between sub-agents using a structured handoff schema - not assumed to carry forward through the model. This added engineering complexity. It eliminated the failure mode entirely.

Failure 02

Data Governance Layer Not Built for Agent-Speed Queries

Our existing data governance infrastructure was built for human-speed access. An analyst pulling a report generates one or two governed queries per session. When we deployed an AI automation in banking operations agent at scale, it generated query volumes our governance layer had never been designed to handle.

The symptom was latency. Audit logs were falling behind real-time. In a regulated environment, that is not a performance issue - it is a compliance issue. If your audit log cannot keep pace with your agent's actions, you cannot demonstrate accountability. That is a deployment-stopping problem.

How we fixed it: We rebuilt the data access layer with agent-native architecture - asynchronous audit logging, query rate controls per agent, and a dedicated governance pipeline that processed agent-generated queries separately from human-generated ones. The rebuild took six weeks. It should have been designed this way from the start.

Failure 03

Compliance and Engineering Misaligned on "Explainability"

This is the failure I think about most - because it was entirely preventable, and because I see it replicated in almost every LLM deployment in enterprise banking I hear about today.

Our compliance team required a plain-English explanation of every agent decision that influenced a customer outcome. That is a reasonable and legally necessary requirement. Our engineering team delivered a model confidence score and a feature importance list. These are technically accurate outputs. They are not the same thing as a compliance-ready explanation.

We spent six weeks in meetings between compliance and engineering before we had a shared definition of what "explainability" meant in a regulatory context. Six weeks. On a definition.

How we fixed it: We built a translation layer - a structured module that converts raw model outputs (confidence scores, attention weights, feature values) into templated narrative explanations that satisfy regulatory documentation requirements. That translation layer is now one of the most valuable components in our entire AI agents financial services stack. It took three weeks to build once we agreed on the spec. The alignment conversation took twice as long as the engineering work.

Key Takeaway

None of these failures were caused by the AI. All three were caused by gaps in architecture, infrastructure, and alignment. Fix those three things before you write the first line of agentic code.

The Results After Full Deployment

After 18 months - from first scoping call to full production - here is what the agentic AI in banking deployment delivered inside credit operations. These are not projections or pilot metrics. These are live production numbers from a system serving millions of banking customers.

77%

Reduction in Handling Time

14 min - 3.2 min per analyst review

34%

Drop in False Positive Escalations

Richer context retrieval per case

100%

Compliance Audit Pass Rate

3 consecutive quarterly audits

99.96%

Agent Uptime

Within core platform reliability standard

The 77% reduction in analyst handling time was the headline number. But it is not the most important result.

When an analyst spends 3.2 minutes instead of 14 minutes on a routine fraud flag, the 10.8 minutes recaptured does not disappear. It gets redirected toward complex cases - the ones that require actual human judgment, nuanced customer context, and experienced decision-making. The quality of those high-stakes decisions improved measurably, because analysts were no longer exhausted by the volume of routine work that an AI automation in banking operations system could handle more accurately than a fatigued human.

The 100% compliance audit pass rate across three consecutive quarters is the result I am most proud of - because it reflects the governance framework, not the model. Any organization deploying AI in regulated industries can build a high-performing model. Very few build a deployment that survives three compliance audits without a single finding. That outcome is a direct product of the four months we spent building governance before we wrote a single line of agentic code.

Second-Order Impact

Human decision quality improved - not because analysts were replaced, but because they were freed.

The most underreported result of a well-deployed AI agents financial services program is what happens to the humans working alongside it. When routine cognitive load is removed, experienced analysts apply deeper judgment to the cases that actually warrant it. The agent did not replace human intelligence in our deployment. It protected it.

What I Would Do Differently If I Started Today

Three decisions shaped this entire deployment - two were right from the start, one was not. If you are planning to deploy agentic AI in banking or any regulated industry, these are the lessons I wish I had in writing before we began.

Lesson 01

Build Governance Before the Proof of Concept

We built governance in parallel with the proof of concept. That created expensive rework. In a regulated environment, your governance framework is not overhead - it is the foundation that every downstream decision sits on. When we had to retroactively apply authority boundaries and audit requirements to an agent already in testing, we lost six weeks and rebuilt two core modules from scratch.

The right sequence: governance first, PoC second. The PoC will move faster, not slower, because every design decision already has a clear boundary to work within. This is the single most important lesson I carry from deploying AI agents in a live financial services environment.

Lesson 02

Invest in Agent Observability from Day 1

You cannot manage what you cannot see. By the time we had mature observability tooling in production, two of our three major failures had already happened. We were flying blind on context utilization, escalation trigger rates, and decision path tracing.

Real-time monitoring of agent decision paths is not a nice-to-have for AI automation in banking operations - it is a compliance requirement waiting to be enforced. Build it in from Day 1. Log every step, every context handoff, every escalation. When your compliance team asks what the agent did and why, you need an answer in seconds - not a post-mortem that takes three days.

Lesson 03

Align Compliance and Engineering in Week 1 - Not Month 6

The explainability gap nearly derailed our entire project. Compliance needed plain-English decision narratives. Engineering delivered model confidence scores and feature importance outputs. Both teams were right - they were solving different versions of the same problem.

We spent six weeks building a translation layer that converted model outputs into compliant narrative explanations. That layer is now one of the most valuable components in our AI stack. But it should have been scoped in Week 1, not discovered as a gap in Month 6. Put compliance requirements inside the engineering specification before any model work begins. That single change saves more time than any technical optimization you will make later.

Closing Thesis
"Agentic AI in banking is a governance, architecture, and alignment problem that happens to involve advanced technology."

Get the first three right - governance, architecture, alignment - and the technology becomes the straightforward part. Most teams I speak with are spending 80% of their effort on model selection and 20% on the organizational conditions that determine whether the model ever ships. That ratio needs to flip. Deploying AI agents in financial services is fundamentally a people and process challenge. The model is the last mile.

Frequently Asked Questions

These are the questions I get asked most often after speaking on agentic AI in banking and AI agent deployment in regulated financial environments.

Scope drift. This is where an AI agent gradually begins operating outside its original defined authority boundaries - and no one notices until there is a compliance event. Preventing scope drift requires a registered, audited agent inventory and mandatory quarterly scope reviews. Without this, compliance exposure accumulates silently across every deployed agent in your environment.

From first scoping call to full production, our deployment took 18 months. That broke down into four months of governance framework design, six months of engineering and phased testing, and eight months of production rollout with compliance checkpoints at each stage. Organizations with mature data governance infrastructure can reduce the governance phase - but not significantly if they are doing it correctly. Rushing governance in a regulated environment is how you create a larger problem two years later.

Not for every step. Our earliest agents used deterministic planning layers with LLMs only at the reasoning and synthesis steps - where contextual understanding and explanation generation were genuinely required. LLMs add power and cost. Match the tool to the task. A rules-based retrieval step does not need an LLM. A step requiring nuanced judgment about an ambiguous transaction pattern probably does. Design the architecture around the decision type, not the model capability.

Automation follows a fixed, predefined script. It executes the same steps in the same order regardless of context. Agentic AI makes runtime decisions across multiple steps based on context it perceives in the moment. This distinction matters because agentic systems require a different risk framework, different explainability standards, and different oversight models than traditional RPA or process automation. Using the wrong governance model for the wrong system type is one of the most common mistakes I see in financial services AI programs.

We built a translation layer between model outputs and compliance requirements. The model produces a confidence score and a feature importance list. The translation layer converts that into a plain-English narrative - "The agent flagged this transaction because X pattern matched Y historical case with Z confidence, and escalated because threshold T was exceeded." Every step in the agent's decision path is logged, timestamped, and stored against the case record. This is what a compliance audit actually needs. A confidence score alone does not satisfy regulatory explainability standards.

We built a custom observability layer on top of our existing data platform rather than relying on a single vendor tool - because no off-the-shelf product in 2023 had mature support for multi-step agent tracing inside a regulated financial environment. What we monitored: decision path logs per agent run, context utilization at each step, escalation trigger rate, authority boundary violations (even attempted ones), and model confidence distribution over time. If any metric moved outside its baseline band, an alert fired before a human had to notice it manually.

Every agent in our registry has a written scope of action - what it can read, what it can recommend, what it can draft, and what it is explicitly prohibited from executing without a human trigger. For financial services, the hard line is this: an agent cannot initiate a financial transaction, update a customer record, or send a customer-facing communication without explicit human approval. Everything above that line can be agent-driven. Everything below it must be human-authorized. That boundary is non-negotiable and it is reviewed at every quarterly audit.

Technically, yes. Practically, it creates significant downstream risk. One of our three major failures was directly caused by deploying an agent on top of a governance layer that was not designed for agent-speed query volumes. The audit logging broke under load. In a regulated environment, broken audit logging is a compliance event - not just a technical bug. If your data governance infrastructure is immature, address that before deploying agentic AI in any customer-facing or decision-making workflow.

Four metrics from production. Analyst review time dropped from 14 minutes to 3.2 minutes per case - a 77% reduction. False positive escalation rate fell by 34% because the agent pulled richer context than analysts typically had time to access manually. Compliance audit pass rate held at 100% across three consecutive quarterly audits. Agent uptime reached 99.96%. The second-order impact was equally important: analysts recaptured time that was redirected to high-complexity cases. The quality of human decisions improved because humans were no longer exhausted by volume.

Three things. First - define what "agentic" means inside your institution before you select a use case. The definition drives the governance requirements. Second - bring compliance into the room in Week 1, not after the first demo. The cost of fixing a compliance misalignment in production is ten times the cost of preventing it in design. Third - start with a workflow where a human already makes a high-volume, low-variance decision repeatedly. That is where agentic AI in banking delivers the clearest ROI and creates the least organizational risk. Win there first. Then expand.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *