Engineering· 8 min read

AI in Financial Services: Compliance-Ready Implementation

Financial services AI faces unique regulatory constraints. Here's how we build compliant AI systems — from data handling to audit trails to model governance — without sacrificing capability.

The Regulatory Landscape

Financial services AI doesn't operate in a regulatory vacuum. In the US, SOX requirements mean that any AI touching financial reporting must have audit trails and demonstrable controls. FINRA rules govern how broker-dealers communicate with clients — an AI chatbot giving investment guidance falls squarely within this scope. In Canada, OSFI B-13 guidelines on technology and cyber risk specifically address AI/ML model governance. GDPR and its North American equivalents impose restrictions on automated decision-making that affects individuals.

None of these regulations prohibit AI. What they require is that AI systems be governed with the same rigor applied to any other material business process — documentation, controls, testing, human oversight, and the ability to explain decisions after the fact. The compliance challenge isn't philosophical; it's engineering.

Data Handling: What You Can and Cannot Send to an LLM API

The first question every fintech team asks us is: "Can we send customer data to the Claude API?" The answer is: it depends on what data, what jurisdiction, and what DPA you have in place with Anthropic. Anthropic offers enterprise agreements with data processing addendums that specify how data is handled, whether it's used for training, and where it's processed.

For most compliance purposes, PII should be pseudonymized before it touches any external API. Replace names with token identifiers. Replace account numbers with internal reference IDs. Replace addresses with region codes. The AI model rarely needs the raw PII to do its job — it needs the relevant facts, which can be represented without identifying information.

→Data residency: confirm that API calls are processed in jurisdictions consistent with your data governance policy. Anthropic's enterprise tier supports region-specific deployments.
→Retention policies: API call logs containing customer data must be subject to the same retention and deletion policies as other systems. Build this into your infrastructure from day one.
→Training opt-out: enterprise API agreements typically include provisions preventing your data from being used in model training. Verify this explicitly in your contract.

Audit Trail Requirements

In regulated financial services, every consequential decision must be reconstructible. If an AI system declined a loan application, flagged a transaction as suspicious, or recommended a specific product — you must be able to reconstruct exactly what the AI was shown, what it produced, and who acted on that output.

This requires logging at the prompt level, not just the application level. Store the complete prompt (system prompt + context + user input), the complete model response, the model version, the timestamp, the user ID, and the downstream action taken. This log is not just for debugging — it's a regulatory artifact that may be requested in an audit or enforcement action.

Immutable audit logs

AI decision logs should be written to an append-only store with cryptographic integrity. Standard application logs are mutable — someone with database access can change them. Regulatory-grade audit logs need to be tamper-evident. Write to a dedicated audit log service separate from your operational database, with access controls that prevent modification after write.

Model Explainability and Human Oversight

Regulators want to understand why an AI system made a decision. LLMs are not inherently explainable — you can't extract a decision tree from Claude's weights. What you can do is require the model to produce its reasoning as part of the output: "explain your reasoning before giving your final recommendation." The chain-of-thought output becomes the explanation artifact.

Human oversight requirements in financial services typically center on the concept of "material decisions." OSFI, the OCC, and FINRA all have frameworks for identifying which AI-assisted decisions require mandatory human review. Build your system with a routing layer: decisions above a risk threshold, decisions in high-stakes categories (credit denial, account suspension, large transactions), and decisions where the model expresses uncertainty all route to human review queues.

Model Risk Management: SR 11-7 Guidance

The Federal Reserve's SR 11-7 guidance on model risk management predates LLMs, but its principles apply directly. SR 11-7 requires model validation by parties independent from model development, ongoing monitoring of model performance, and documentation sufficient to support independent replication. For AI systems at regulated institutions, this means building a validation process before go-live.

Validation for LLM-based systems differs from traditional quantitative models. You can't run statistical backtests on language model outputs. Instead: define performance benchmarks on representative test datasets, run bias testing across demographic groups, conduct adversarial testing to probe for failure modes, and document model limitations explicitly. The validation report is a deliverable, not a rubber stamp.

Incident Response for Regulated AI

When an AI system makes a mistake in a regulated context — gives wrong advice, makes an incorrect credit decision, flags the wrong transaction — you need a defined incident response process. This includes: immediate containment (stop or restrict the affected AI function), impact assessment (identify all affected decisions and customers), root cause analysis, regulatory notification if required, and remediation (correcting affected records, notifying affected customers).

Write this process before you need it. Define the severity tiers, the escalation path, the regulatory notification threshold, and who has authority to suspend a production AI system. A well-designed AI system in financial services is one where you can answer the question "what happens when this goes wrong" with a specific, practiced plan.

Enterprise AI Governance: A Practical Guide

Read article →

How We Deploy Claude in Production