Three-quarters of AI developments in the banking sector are still stuck in isolated experimental projects. The implementation of Sentient Banking as a new operating model is not merely a software challenge. Alongside the integration of algorithmic solutions, it also requires breaking down data silos between departments and leveling up organizational UX maturity. The final part of our Sentient Banking series reveals why the infrastructure of trust and the responsible governance of autonomous agents are fundamental prerequisites for true intelligent banking.

 

A 2025 Boston Consulting Group survey summarizes the true state of AI developments in the banking sector in a single number: 75% of financial institutions are still stuck in isolated pilot projects and closed proof-of-concept programs. This is not because the technological direction is uncertain – the principles of semantic design systems, adaptive interface architectures, and intent-based interaction are now well documented – but because organizational inertia hinders meaningful transformation.

Enjoying the article?

Subscribe to our biweekly newsletter for more insights like this.

Previously in this series, we discussed the limits of the self-service model, the philosophy of intent-based interaction, the methodology for hallucination-free AI integration, and the specific patterns of Sentient UX. Now, we focus on the missing part of the equation: the infrastructure built not in code, but within the organization.

The Sentient Triangle from Josh Clark's Sentient Design framework – the triad of grounded, interoperable, and adaptive principles – originally emerged as a software requirement system. From an organizational perspective, however, all three dimensions are simultaneously technological and management tasks. A grounded system demands that AI draw exclusively from verified, auditable sources. This requires clean data architecture and decision-making discipline from the organization. The interoperable pillar urges the breakdown of data silos between risk management, retail banking, and the legal department: the machine cannot build context if the organization itself is fragmented. The adaptive dimension means not only that the interface adapts to the user, but also that the underlying organization is flexible enough not to filter customer needs through the constraints of existing processes.

Accordingly, the implementation of Sentient Banking requires structural decisions in three areas: the transparency of AI decisions, the governance of autonomous agents, and a compliance framework capable of simultaneously managing both modes of operation.

The Return on Transition: What Do the Numbers Show?

Transitioning to a semantic design system is a major organizational decision, but not a leap into the dark. The joint project between IBM iX and the German health insurer BARMER is one of the best-documented examples of how an AI-optimized component library yields tangible returns. During the project, they created a component library – a unified system of digital building blocks – that AI can independently interpret and use. The result of the transition speaks for itself: design work was reduced by 60%, and frontend development effort by 43%. The system's two key applications now serve more than two million customers with accelerated development cycles; the project also won the German Digital Award and the Red Dot Design Award.

BARMER's design systemBARMER's design system

While harder to quantify, the organizational impact is just as significant: such a component library forces the bank to break down the walls between departments that have previously isolated the work of IT, compliance, and design teams from one another. If an interface element can also be assembled by an algorithm, the necessity for prior validation creates a commonly expected standard, not only in a technological sense, but in an organizational one as well. In this context, breaking down silos is not a catchy slogan, but an architectural imperative.

The Glass Cockpit: AI Decision Transparency as an Entry Condition

If there is one thing systematically missing from the discourse on banking AI projects so far, it is the question of how to verify everything the machine does. The opacity of a content recommendation algorithm is merely annoying at worst. However, in the case of a system that rejects a loan application, modifies an investment decision, or locks a transaction, a black-box operation poses simultaneous legal, ethical, and business risks.

Explainable AI (XAI) now has a concrete methodological toolkit. The three most commonly used approaches (LIME, SHAP, and counterfactual) answer different questions and cater to different organizational stakeholders.

The essence of the LIME (Local Interpretable Model-agnostic Explanations) method is that instead of understanding the model as a whole, it focuses on a single, individual case. It examines which variables tipped the scales in that specific situation, thus providing a quick, intuitive answer in real time to the question "Why was this specific decision made?" For example, a security analyst monitoring transactions only needs to know that the system "flagged the transaction based on the combination of a different location and an unusual amount."

Grounded in game theory, SHAP (SHapley Additive exPlanations) analysis uncovers a deeper layer. It not only examines isolated cases but can also show the global weight of all determining factors, meaning the extent to which individual parameters collectively contribute to the machine's decisions. This method provides the system-level, comprehensive detail required for compliance audits and regulatory reporting.

The counterfactual model is the most customer-oriented tool. It doesn’t just state why the system rejected a loan application; it also shows what conditions (such as higher income or a lower credit limit) would have needed to be met for it to be accepted.

Interactive Dialogue Instead of Algorithmic Rejection

This counterfactual principle fundamentally rewrites the customer relationship. Instead of a loan rejection remaining an anonymous, relationship-ending message, the system provides personalized feedback: "If the credit card balance is reduced by €1500, approval is expected with a 94% probability in the next 30 days." The fact that the loan application was rejected does not disappear with this; instead, it becomes a customer retention tool with a measurable impact on customer lifetime value. Capital One calls this approach "UX accountability." The AI decision does not end at the machine output but is the beginning of a dialogue.

JPMorgan's COIN system – which analyzes 12,000 credit agreements in seconds, replacing approximately 360,000 hours of manual administration annually – functions credibly precisely because it does not just automate, it does so along a strictly documented, auditable logic. FICO's responsible AI implementation complements this with blockchain-based model governance: a traceable, immutable footprint belongs to every predictive decision.

However, all this only becomes truly usable if the explanation doesn’t remain hidden in the backend systems but also appears on the interface in an easily understandable format for both the administrator and the retail customer. It is not enough for the backend to know the reasons; they must be shaped into a human-scale narrative. The logic of the previously discussed Sculptor pattern applies here as well. Complex algorithmic decisions must be made tangible using interactive sliders, dynamic "what if" simulations, and visual scenarios. If an administrator can see in real time which parameter change would tip the machine's decision, they will not only understand the system better, but they are also more likely to trust it. The Glass Cockpit concept is therefore not merely a backend architecture, but a design task as well.

The Autonomous Agent as a Synthetic Employee

As soon as banks leave chatbot logic behind and deploy truly autonomous digital agents – ones that access databases, initiate processes, and make decisions – the governance model must also change. According to McKinsey's analysis, these systems often inherit the same access and system permissions that human employees work with, which opens up entirely new types of security risks.

The most pragmatic defense is for institutions to treat these systems as synthetic employees. HSBC, for example, assigns a dedicated business owner to every deployed agent and defines access privileges exactly as they would when onboarding a new colleague. The agent does not receive unlimited system rights and does not remain unsupervised. Citigroup's internal Stylus Workspaces initiative is based on a similar logic: employees work alongside agents on large datasets and financial processes, but the framework accurately records what the agent can perform independently and what requires human approval.

The leader of such a hybrid team is a curator of collaborative intelligenceThe leader of such a hybrid team is a curator of collaborative intelligence

This new type of workforce cannot be managed in isolated experimental projects. For system-level supervision, a centralized AI center of excellence is essential, providing unified standards and, if necessary, immediate shutdown capabilities. The leader of such a hybrid team is no longer a traditional manager, but a curator of collaborative intelligence who must constantly recalibrate the division of labor, knowing where the machine's synthesizing work ends, and where responsible human decision-making begins.

The Agent UI Governance Gap: When the Escalator Becomes a Regulatory Issue

The concept discussed in an earlier post as the escalator principle – i.e. in the event of an AI layer failure, the system must seamlessly fall back into a traditional, manually manageable state (just as an escalator turns into traditional stairs when it stops) – appears as a much more complex problem in light of organizational and legal issues.

If a bank's interface is capable of operating simultaneously in AI-driven and manual modes, ensuring the continuity of the audit log becomes a legal issue rather than a technical one. Who bears responsibility for a decision made by the AI layer before the fallback mechanism kicks in? If the generative interface drops out and the user lands on a static, manual form, the agent's security check – which might have only been valid in generative mode – becomes circumventable. The industry identifies this vulnerability as the Agent UI Governance Gap: the control gap created at the moment of switching between the two operating modes.

What is mentioned even less often is that maintaining the two systems in parallel inherently requires significant expenditure. An AI-driven interface and the static fallback behind it must be tested, maintained, and audited independently of each other – this involves duplicating testing resources. Building the infrastructure of trust thus demands a massive initial investment due to architectural redundancy. In the long run, however, this is the only way to avoid critical downtimes and regulatory fines: the doubled maintenance cost pales in comparison with what a governance gap can cause if it surfaces in a live production environment.

This is exactly why Harriet Rees, CIO at Starling Bank and the government's financial AI envoy, proposed creating a centralized banking AI testing regime in the UK. The initiative aims to ensure that foreign frontier models can only integrate into critical banking infrastructure if they meet a unified British minimum standard. This development indicates that regulatory thinking is increasingly focusing on managing the transition between the two operating modes, not just on whether the AI is reliable in itself.

The Metrics Revolution

One of the least visible yet inescapable consequences of the aforementioned organizational changes is that previous digital banking success metrics become invalid. Time on App was a positive indicator of the self-service era; in the age of autonomous systems, an increase in this metric actually signals that something is not working.

The metrics that replace them reflect a different logic. The Intent Match Rate measures the proportion at which the system correctly identifies what the user wants – this is the benchmark for the basic performance of a sentient system.

First-Touch Resolution shows how often a customer's need is resolved in a single interaction, without human intervention. The AI Output Acceptance Rate indicates whether the agent’s decision proposals are accepted by customers and administrators; this is the most direct numerical imprint of trust.

If these numbers become the new compass, the focus of product development will be rewritten. A product manager who previously thought in terms of features and screens will now think in terms of intents and outcomes. Success won’t be that the user found the button, but that the system did the work for them.

About the authors

Balázs Szalai thumbnail
Balázs Szalai
Content Strategist

Balázs has been working in content for more than 20 years, having the role as an editor at one of the first and largest news sites, later helping to establish the content marketing business for media publishers and agencies. Today, Balázs serves as content producer at Ergomania Ltd.