Beyond Code Reasoning: Specification-Anchored Auditing of Multi-Implementation Distributed Protocols

Masato Kamba, Hirotake Murakami, Akiyoshi Sannai

Code-driven auditing fails when correctness depends on what the specification requires rather than how the code is written. Production blockchain networks expose this directly: byzantine consensus runs many independent clients of a shared specification, so a specification-divergence defect in one client can fork the network or halt finality. Existing tools reason one repository at a time, with no shared baseline held constant across implementations. We present SPECA, an LLM-driven audit framework that derives explicit, categorized security properties (invariants, pre/postconditions, trust assumptions) from natural-language specifications and reuses them across implementations. SPECA enables controlled cross-implementation comparison, detections grounded in specification invariants no code pattern encodes, and false positives traceable to a specific pipeline phase rather than opaque model errors. On the Sherlock Ethereum Fusaka Audit Contest (10 targets, 366 submissions), SPECA recovers all 15 in-scope H/M/L vulnerabilities expert-augmented (8/15 automated-only) and surfaces 4 fix-confirmed bugs, including a cryptographic-invariant violation missed by every adjudicated finding. On the RepoAudit C/C++ benchmark, SPECA reaches 88.9% precision at 100% recall (F1=0.94) and surfaces 12 author-validated bugs beyond ground truth, two externally validated. SPECA also flags 5 of RepoAudit's 40 published bugs as defensive-coding fixes with no reachable exploit path. False positives trace to three pipeline-pinned root causes; a multi-model study identifies property-generation quality as the binding constraint. End-to-end cost is ~$30 per H/M/L bug (~42 min wall-clock under parallel execution).

picture_as_pdf flag

Knowledge Graph

arrow_drop_up

Comments

Sign up or login to leave a comment