2026-03-09

Your Legacy Code Has Vulnerabilities You've Never Found. AI Just Changed That.

Most businesses running software built more than five years ago are also running security risks they've never properly audited. Not because no one cares, but because legacy code audits are expensive, slow, and require specialists who understand architectures most developers no longer work with.

AI can now read old code, even machine code from 40 years ago, and find bugs.

Microsoft Azure CTO Mark Russinovich demonstrated this last week, using Claude to decompile and analyze a utility he wrote in 1986 for the Apple II. The code was in 6502 machine language, a low-level assembly dialect most developers today have never touched. Claude found a genuine vulnerability: a silent failure mode where the program would set pointers incorrectly instead of reporting errors.

That's not just a nostalgic exercise. Countless embedded systems and legacy microcontrollers run firmware written decades ago that has never been formally audited. Industrial equipment, older enterprise software, internal tools nobody has touched in years. This code runs real businesses.

A traditional security audit for legacy systems is expensive. A single engagement can run $20,000 to $100,000 or more depending on scope. Most small and mid-sized businesses skip it. They run the risk because the alternative feels out of reach.

Static analysis tools like Semgrep and CodeQL have been doing automated vulnerability detection for years, but they struggle with legacy architectures and obscure assembly dialects. That's the gap AI fills: it can reason about code it was never explicitly trained to parse.

AI code auditing doesn't replace a full penetration test. But it lowers the cost of the first pass considerably. Russinovich's experiment showed Claude could identify a real vulnerability in 40-year-old assembly code, without any specialist knowledge from the person running the audit.

Anthropic's own red team confirmed this tendency when releasing the model: it "found high-severity vulnerabilities, some that had gone undetected for decades" in well-tested codebases with years of fuzzing data behind them.

This opens up a triage approach to legacy security that wasn't really feasible before.

Take a critical legacy component: a specific module, a firmware binary, an old internal service. Paste it into Claude or GPT-4 and ask it to identify potential security issues, silent failure modes, and unhandled edge cases. You'll get a list of things worth a closer look. Then bring in a human specialist only for what's flagged as high severity.

This works best on isolated components, not entire codebases. Large systems need to be broken into auditable chunks first. And like any AI output, the results need human review. Treat it as a triage list, not a final report. The AI will surface real issues, but it will also flag false positives.

The goal isn't to replace a specialist. It's to give the specialist a head start, which makes the engagement shorter and more focused than starting from scratch.

There's a catch. The same capability that helps defenders also helps attackers. The bar for finding vulnerabilities in old systems is dropping for everyone, not just the good guys.

The businesses that audit their own legacy code before someone else does will be glad they didn't wait.

If you don't know where your oldest code lives or when it was last reviewed, that's the question to start with. The answer is probably more interesting than you'd like.