Skip to content

ML-ADSA-87 — Benchmarks and Sizes

Companion to the specification (docs/30) and verification dossier (docs/31). All numbers below are measured by go test -bench on the actual reference implementation — not estimates.

Measurement conditions (honest). - Platform: Apple M5 (arm64), Go reference implementation in qrl-integration/ml-adsa/qrysm/mladsa/ (byte-identical to the canonical go-mladsa/). - Command: go test ./mladsa/ -run='^$' -bench=. -benchmem (default benchtime); deterministic KAT inputs (bench_test.go reuses kat_test.go conditions). - This is the reference implementation: portable, not optimized, not yet constant-time hardened, and AggregateF includes Fiat-Shamir-with-aborts rejection sampling, so per-op timing has real variance (an optimized/AVX2/assembly + batched implementation would be substantially faster; docs/32 item #6). Numbers are indicative of the algorithm's shape, not of a production-tuned build.


1. Primitive operations

Operation ns/op ≈ time B/op allocs/op
ExpandA(ρ) (matrix expansion) 166,835 0.167 ms 121,664 177
ContentKeyDerive (per-content key refresh, L1) 101,076 0.101 ms 203,937 147
MemberKeyGen (registration keypair + PoP) 807,869 0.81 ms 920,402 589
Verify (in-house, = FIPS-204 verify) 433,461 0.43 ms 356,668 318
Verify (go-qrllib native ML-DSA-87) 137,960 0.138 ms 1,158 4

MemberKeyGen is a one-time, per-epoch-registration cost (it generates an ML-DSA registration keypair and a proof-of-possession). ContentKeyDerive is the recurring per-content refresh.

2. Aggregation (AggregateF) by committee size N

Includes rejection sampling; the output is a single ML-DSA-87 signature regardless of N.

N (signers) ns/op ≈ time per-signer B/op allocs/op
1 399,119 0.40 ms 0.40 ms 708,286 521
4 851,372 0.85 ms 0.21 ms 1,470,873 1,139
16 4,983,889 4.98 ms 0.31 ms 9,011,744 7,107
64 18,927,026 18.9 ms 0.30 ms 16,729,178 13,393
128 23,825,097 23.8 ms 0.19 ms 33,002,311 26,452

Aggregation is ≈ linear in N (one A·y and one response per signer, plus the shared combine), modulated by rejection retries — hence the per-op variance and the not-perfectly-monotone per-signer column. Crucially this is a one-time combine per slot/content, and in the decentralized deployment it is split across the committee (each signer does its own ContentParts/ContentResponse; any party runs the public combine).

3. Verification cost is O(1) in N — the headline result

The verifier does one ML-DSA-87 verify against pk* regardless of how many signers aggregated:

Scheme bytes on the wire verify cost
Per-attester signature list (naive PQ port) N × 4627 (≤ 592,256 at N=128) N verifies (O(N))
ML-ADSA aggregate 4627 (constant) 1 verify (~0.14–0.43 ms), O(1)

This is the BLS-like win: constant signature size and constant verification, independent of committee size.

4. Sizes

Object Size Note
Aggregate public key pk* 2592 B a valid ML-DSA-87 public key (constant)
Aggregate signature σ* 4627 B a valid ML-DSA-87 signature (constant, any N)
Aggregation bits ⌈N/8⌉ + 1 B (≤ ~17 B at N=128) already present upstream; identifies the signer set
Master secret seed 32 B the only long-term secret per signer

Compression vs the per-attester list

Committee N list size ML-ADSA reduction
8 37,016 B 4,627 B
16 74,032 B 4,627 B 16×
64 296,128 B 4,627 B 64×
128 (max) 592,256 B 4,627 B 128×

No needed information is lost: signer set = the public aggregation bits, key = Σ epoch-tree tᵢ, validity = one FIPS-204 verify.


4b. After optimization pass (pure-Go, byte-identical, same machine)

The pure-Go arithmetic was optimized (branchless constant-time modQ/cabs; a fused allocation-free multiply-accumulate pwacc replacing padd(acc, pw(...)) in the A·y/A·z matrix products). Output is byte-identical (all KATs unchanged). Measured deltas (Apple M5):

Op before after Δ allocs/op Δ B/op
ContentKeyDerive 0.101 ms 0.077 ms 147 → 91 204 KB → 89 KB
AggregateF N=16 4.98 ms 5.3 ms¹ 7,107 → 5,202 9.0 MB → 5.1 MB
AggregateF N=64 18.9 ms 15.8 ms 13,393 → 9,753 16.7 MB → 9.3 MB
AggregateF N=128 23.8 ms 19.4 ms 26,452 → 19,228 33.0 MB → 18.2 MB
Verify (in-house) 0.43 ms 0.37 ms 318 → 262 357 KB → 242 KB

¹ N=16 time is within rejection-sampling noise; allocations/bytes dropped ~30–45% across the board, which is the durable win. Roughly 45% less memory traffic and ~18–25% faster on the hot path, byte-for- byte identical output. Further gains (in-place NTT, lazy reduction, AVX2 asm) are in docs/34.

5. Reproduce

cd qrl-integration/ml-adsa/qrysm
go test ./mladsa/ -run='^$' -bench=. -benchmem            # all benchmarks above

Indicative production-tuning headroom (not yet done, docs/32): NTT/poly arithmetic in assembly/AVX2, batched verification, allocation reduction (the reference allocates liberally), and a constant-time hardened build. These would primarily speed up AggregateF (the rejection-sampling loop) and bring in-house Verify toward the go-qrllib native figure.