Skip to content

26 — qrysm ML-ADSA integration: security & operational audit

Read-only audit of the ML-ADSA additions to the QRL 2.0 qrysm fork (the vendored mladsa package, the mladsa-devnet command) and the upstream qrysm surfaces the integration touches. Each finding is classified: severity, security vs operational, and easily patchable (localized fix) vs not (design-level/cross-cutting). The crypto core (NTT, rounding, Merkle domain separation, hint value-validation) is carefully written and CIRCL/go-qrllib-cross-checked; the exploitable gaps are at the trust boundaries (decode/deserialize/public-input acceptance) and integration semantics.

Status legend: [PATCHED] fixed in this pass · [EASY] localized fix, recommended · [DESIGN] cross-cutting, not a quick patch.

High

# Finding (file:line) Sec/Ops Easy? Status
H1/H2 Verify decoded pk/sig with no length check → adversarial short input panics in PkDecode/SigDecode/unpack/hintUnpack (mldsa87.go Verify/278/322/259/370). Remote-DoS / node-crash primitive in consensus. Security EASY [PATCHED] — added len(pk)==2592 && len(sig)==4627 guard at the top of Verify (both go-mladsa & qrysm copies) + regression TestVerify_RejectsMalformedInput. Valid-input behavior unchanged ⇒ proofs/KATs/CIRCL still pass.
H3 Devnet/wire deserialize [][]int64 polynomials (commitMsg.T/W, respMsg.Z, UnmarshalMember) with no dimension/range check before NTT/coeffSum → remote panic or cross-node divergence; unbounded JSON size → memory-amplification DoS. Security/Ops EASY (shape/range gate) / DESIGN (size cap) [EASY] add validateVec(v, dim, N) + coeff∈[0,Q) before any deserialized poly enters the math; cap message size. (Size-cap properly = SSZ wire format, see H5.)
H4 Decentralized combine trusts committed t_i with only Merkle membership, no well-formedness/norm bound (decentralized.go SharedChallenge/CombineFromPublic). A registered-but-malicious member can commit an arbitrary t_i shifting pk*. Bound (Power2Round(Σtᵢ).t1==pk*.t1) is only in the optimistic ProvenanceVerifyF, not the hot path. Security done [DONE] — the consensus verify path now reconstructs pk* ONLY from epoch-key-tree-committed, registration-authenticated keys via mladsaconsensus.EpochKeyTreeStore.Resolve (wired into attestation.AggregateAttestationPubkey): per attester it checks VerifyEpochRoot (root signed by the reg key) + regpk==beacon-state pubkey + VerifyContentInTree (F-C9 membership for the predictable (slot,committee) label) + coeff-range, then AggPkStar. An injected/equivocating tᵢ therefore cannot enter pk*. (Earlier coeff-range gate in VerifyAggregatedAttestation closed the DoS panic.) Test TestEpochKeyTreeResolver_EndToEnd + 4 negative controls. Remaining: populate the store from beacon-state/p2p at epoch boundaries (state plumbing).
H5 Replacing repeated bytes signatures (per-attester list, invariant len(sigs)==len(bits) enforced at attestation_utils.go:68/138/150, signature.go:203) with one ML-ADSA aggregate changes SSZ hash-tree-root shape, per-index verification, slashing/equivocation detection, fork-choice, and the aggregator gossip topic. Security/Ops NOT [PHASE 1+2 DONE] — Phase 1: mladsa/consensus.go + mladsaconsensus/. Phase 2 (docs/29): the proto/SSZ field is now a single fixed 4627 B bytes signature by construction (proto×4 + regenerated pb.go/generated.ssz.go — new HTR layout, hard fork), and the verification (VerifyIndexedAttestationSigs/createAttestationSignatureBatch → one aggregate vs reconstructed pk*, ctx "ZOND") and aggregation (AggregatePair/max-cover → single-aggregate dedup, distinct-merge errors) are rewritten. Whole non-test module builds clean. Tests: SSZ+proto round-trip, real-crypto verify-path (accept + nil-resolver/wrong-pk*/tamper reject). Remaining: H4-full epoch-tree pk* wiring, H6 batch-verify, slasher load-path, _test.go suite migration.
H6 Each slot every node runs a full ML-ADSA Verify (NTT-heavy, 4627 B) — materially costlier than BLS; attacker-floodable gossip with no batch-verify / rate-limit ⇒ DoS surface. (Core ordering is OK: Verify does cheap z-bound/weight checks before ExpandA; CombineFromPublic checks maxAbs first.) Ops/Security NOT [DESIGN] — batch verification + gossip rate-limiting are integration work.

Medium

# Finding Sec/Ops Easy? Status
M2 OneTimeGuard was an in-memory map (construction_f.go). On node restart it reset → the same (id, content) could be re-signed with the same deterministic nonce → two sigs, same y, different c ⇒ secret recovery. (Audit only catches distinct t for the same (id,C), not same-t nonce reuse.) Security EASY (turned out localized) [PATCHED] — added DurableOneTimeGuard (onetime_durable.go, both copies): append-only fsync'd used-set with mark-before-sign (write-ahead) and torn-record-safe replay, so reuse is refused across restarts. AggregateF now takes the oneTime interface (in-mem or durable). Tests TestDurableOneTime_SurvivesRestart, TestAggregateF_DurableReuseAcrossRestart. (Optional bounded variant — a monotonic per-epoch content counter à la SP 800-208 — remains a future optimization; the durable used-set is correct for arbitrary contents.)
M5 On maxAttempts exhaustion AggregateF errors and the devnet calls die() (crash); AggregateRejfree/AggregateM1 return (nil,nil) which callers (SignSingle/MemberKeyGen/EpochKeyTreeBuild) don't nil-check. Ops EASY [PATCHED] — the devnet's combine-bounds failure (the cited liveness event) now abstains from the slot (return + log) instead of die(); the signing root is fresh next slot. A self-produced aggregate that fails verification (a correctness violation, not liveness) stays loud in the devnet (prod would log+abstain+alert). The n==0 (nil,nil) paths are unreachable here (committee always non-empty) and already guarded upstream (M6/M7).
M6 AggregateM1: box := (GAMMA1-BETA)/ndivision-by-zero panic if n==0 (reachable if all participants are filtered/excluded). Security EASY [EASY] guard if n==0 { return nil,nil } (and reject empty participant sets upstream).
M7 Empty participant set: SharedChallenge/CombineFromPublic over empty ids/ts/zs produce a degenerate all-zero aggregate ("aggregate of nobody") rather than rejecting. Security EASY [EASY] require len(participants) > 0 in AggregateF/CombineFromPublic.
M1 highbits is not reduced mod m while useHint is; consistency relies on decompose never returning r1==m (the q−1 special case prevents it) — fragile, unasserted. Security EASY [EASY] add % m to highbits or an assertion + test.

Low

# Finding Sec/Ops Easy? Status
L1 rejNTTPoly uses a fixed 4096-byte SHAKE buffer; on (astronomically rare) heavy rejection it over-reads → panic. Deterministic in ρ (fixed/honest), so not attacker-triggerable. (rejBoundedPoly already grows its buffer correctly.) Ops EASY [EASY] stream or grow the buffer like rejBoundedPoly.
L2 wire.go MarshalMember serializes the master seed + s1/s2 (secrets) to JSON; UnmarshalMember does no validation. Unused on the devnet wire (only public data is gossiped) but a loaded gun. Security EASY [EASY] restrict to public fields / remove; never serialize secrets.
L3 Devnet file-bulletin: commit/resp/agg files are not individually authenticated (only reg is PoP-bound); fine as a P2P stand-in, does not translate to production. Sec/Ops n/a [DESIGN] production needs authenticated p2p (sender-auth, equivocation handling).
L4 Verification comparisons are non-constant-time — benign (public data). Security n/a acceptable.
L5 epochBytes (big-endian) vs slotBytes/upstream IsAggregator (little-endian) — consistent per use but mixing BE/LE across shared hashes is fragile. Ops EASY [EASY] standardize endianness.

Disposition

  • Patched now: H1/H2 (remote-DoS Verify guard), M2 (durable one-time state), M6 (n==0 div-0 guard in AggregateM1/Rejfree), M7 (empty-participant guard in AggregateF/ CombineFromPublic), L1 (rejNTTPoly streaming — no over-read; byte-identical output, KATs unchanged), L2 (removed dead wire.go that serialized the master seed), H3 (shape/range ValidVec gate added + wired into the devnet commit/response path), and M1 (full-sweep test proving highbits(r)∈[0,m) — the useHint/highbits invariant). Tests: TestVerify_RejectsMalformedInput, TestHighbits_InRange, TestValidVec, TestDurableOneTime_*. KATs unchanged (no behavior drift).
  • Also patched this pass: M5 (devnet combine-bounds failure now abstains per slot instead of die()), H4 partial (coeff-range/shape gate on committed tᵢ at the consensus verify boundary — closes the AggPkStar remote-DoS panic; tests TestMalformedAttesterKey_RejectedNotPanic).
  • Still recommended (localized, not yet applied): L5 (BE/LE endianness standardization — deferred: changing epochBytes would churn the pinned KATs for no functional gain), H3 message size cap (structural once the Phase-2 fixed-size bytes signature field lands, H5/docs/29).
  • H5 Phase 1 implemented (docs/29): a single ML-ADSA aggregate now occupies the existing signatures field of the real qrysmpb.Attestation (mladsa/consensus.go, mladsaconsensus/), proven to compress 12–128× with intact SSZ hash-tree-root and an unmodified FIPS-204 verifier, and to reject tampering / sybil key-substitution. Phase 2 (proto/SSZ hard-fork to a single bytes signature, verification + aggregation + slasher rewrite) remains design-level and is gated on H4/H6.
  • Design-level (not quick patches): H4 (t_i well-formedness binding), H5 Phase 2 (proto/SSZ/slashing/fork-choice swap — see docs/29), H6 (batch-verify/rate-limit), L3 (authenticated p2p). These are the real integration milestones, tracked in qrl-integration/README.md and docs/23.

Bottom line: the inner cryptography is sound and cross-checked; the audit's actionable surface is input validation (mostly easy — the critical Verify panic now fixed) and the consensus-integration semantics (design-level). The most dangerous item, M2 (durable one-time state — deterministic-nonce reuse on restart ⇒ key recovery), is now fixed with a fsync'd, mark-before-sign durable used-set. H5 Phase 1 is implemented (single aggregate in the real Attestation, docs/29) and H4 has its DoS-panic surface closed; the remaining substantial work is the Phase-2 consensus-integration design items: H4-full (mandatory provenance binding of tᵢ), H5 Phase 2 (proto/SSZ hard-fork + verification/aggregation/slasher rewrite), H6 (batch-verify/rate-limit), and L3 (authenticated p2p).