Lean proved this program was correct; then I found a bug
This is IMO very exciting. Formal verification and formally-proven correctness in code using Lean, which was in turn exercised heavily using Claude, which managed to turn up a totally unexpected runtime bug:
The positive result here is actually the remarkable one. Across 105 million executions, the application code (that is, excluding the runtime) had zero heap buffer overflows, zero use-after-free, zero stack buffer overflows, zero undefined behaviour (UBSan clean), and zero out-of-bounds array reads in the Lean-generated C code. [...]
The two bugs that were found both sat outside the boundary of what the proofs cover. The denial-of-service was a missing specification. The heap overflow was a deeper issue in the trusted computing base, the C++ runtime that the entire proof edifice assumes is correct (and now has a PR addressing).
Overall verification resulted in a remarkably robust and rigorous codebase. AFL and Claude had a really hard time finding errors. But they did still find issues. Verification is only as strong as the questions you think to ask and the foundations you choose to trust.
Tags: programming coding future lean formal-methods correctness linting bugs zip verification testing