Engineering· 7 min read

Legacy System Integration with AI: A Practical Approach

Most enterprise AI projects hit the same wall: connecting modern AI capabilities to legacy infrastructure that wasn't designed to talk to anything. Here's how we approach it.

The Legacy Integration Challenge

Legacy systems — mainframes running COBOL, 20-year-old SAP deployments, Oracle ERP instances that predate the iPhone, custom-built claims processing systems that nobody fully understands anymore — are the reality of enterprise IT. They're also where the critical business data lives. Any AI system that can't access this data is limited to surface-level use cases.

The integration challenge is structural. Legacy systems were built to be used by humans through terminals, or to exchange batch files with other systems on a schedule. They weren't designed for real-time API access. They often have undocumented behavior, limited error handling, fragile authentication mechanisms, and connection limits that predate the expectation of high-frequency programmatic access.

The good news is that this is a solved engineering problem — solved repeatedly by teams integrating modern web applications with enterprise systems over the past two decades. AI integration uses the same patterns, applied with an understanding of the specific demands AI systems place on data sources.

The Spectrum of Integration Approaches

The right integration approach depends on what the legacy system can support, what latency is acceptable, and whether the AI system needs read access, write access, or both. The major patterns:

→API gateway pattern: Build a modern REST or GraphQL API layer in front of the legacy system. The gateway handles authentication, request transformation, error normalization, and rate limiting. The AI system talks to the gateway, which talks to the legacy system. This adds a layer of infrastructure but dramatically simplifies the AI system's integration requirements.
→Database synchronization: Replicate the data the AI needs from the legacy system into a modern database optimized for AI access — a vector database for semantic search, a SQL database for structured queries. The AI system reads from this replica rather than the legacy system directly. Synchronization lag is the tradeoff: the replica may be seconds or minutes behind the source of truth.
→Event streaming: If the legacy system can emit events (database change events, message queue entries, file drops), an event-driven architecture decouples the AI system from the legacy system's availability and performance. The AI system subscribes to events and processes them at its own pace.
→RPA bridge: When there's genuinely no API and no database access, robotic process automation — screen scraping, keyboard automation — can bridge the gap. This is a last resort; it's fragile and breaks when the UI changes. But for truly inaccessible systems, it's sometimes the only path.

The Read-Only Integration Pattern

The safest starting point for legacy AI integration is read-only access. Give the AI system the ability to query and read from the legacy data, but require all writes to go through a human-operated interface or a separate, explicitly authorized write pathway. This reduces the blast radius of AI errors — if the AI misinterprets something, it makes a wrong recommendation, not a wrong data change.

This isn't just a safety measure; it's also a performance strategy. Legacy systems often have very different read vs. write performance characteristics. A read-only replica can be tuned for query performance without affecting the write path that the business depends on. Separate read and write concerns at the integration layer and you get both safety and performance benefits.

Production example: legacy claims system integration

We integrated Claude with a 15-year-old claims processing system for an insurance client. The system had no external API, a proprietary database schema, and was accessed by 400 users through a 1990s-era terminal interface. Our approach: a nightly ETL process that extracted structured claim data into a modern PostgreSQL instance, supplemented by a document ingestion pipeline for the PDF attachments. Claude queries the Postgres replica and processes the documents via RAG. The legacy system is never touched by the AI — all AI interactions are read-only against the replica. Claim handlers use Claude to draft responses and identify missing documentation, then execute actions through the existing terminal interface.

Security and Access Control

Legacy systems often have coarse-grained access controls — a service account either has access to everything or nothing. This is a problem when the AI system should only see data relevant to specific users or roles. The API gateway or synchronization layer is where you implement the finer-grained access controls the legacy system can't provide.

For AI systems specifically, apply the principle of least privilege aggressively. The AI should only have access to the data categories it actually needs to perform its function. If an AI assistant helps customer service reps handle billing inquiries, it should have access to billing records — not to the broader account database, not to internal financial data, not to other customer data. Scope the access precisely and document why each data category is required.

Common Failure Modes

Legacy integrations fail in specific, predictable ways. Knowing them in advance lets you design around them:

→Timeouts: Legacy systems often have long response times under load. Design all integration code with explicit timeout handling and graceful degradation — the AI returns a partial answer or escalates to human review rather than hanging indefinitely.
→Connection limits: Old systems were often designed for dozens of concurrent connections, not hundreds. Connection pooling at the gateway layer is mandatory.
→Data format surprises: Legacy systems often have data quality issues — null values where values are expected, inconsistent date formats, encoding problems. Build robust data normalization into the integration layer and test against real production data, not synthetic samples.
→Authentication complexity: Legacy auth mechanisms (NTLM, Kerberos, proprietary session tokens) don't map cleanly to modern API patterns. Allow significant engineering time for authentication alone.

Building AI Agents That Don't Break

Read article →

From POC to Production