LMSA — Forward/Backward Analysis of ML-DSA-87 Aggregation¶
Historical / iteration note (2026-06-11). This document is part of the research/design trail and reflects an earlier iteration; some counts, status labels, and construction details predate the current Construction F. The authoritative current specification is
docs/30, the verification status and tallies are indocs/31and reproducible viaformal/count-artifacts.sh(29 artifacts, 134 lemmas, 33/33 genuineness, 6 Gobra), and the cross-document reconciliation isdocs/35. Numbers below are preserved as the historical record.Goal: determine, from first principles (not by citation), whether many independent ML-DSA-87 signatures can be combined into one object accepted by the standard ML-DSA verifier, while remaining unforgeable and granting no power beyond the original keys. We work backward from the verifier and forward from known mechanisms, and we test the transform-domain (NTT / FFT / convolution-theorem) idea directly.
0. Exact setup (so every claim below is checkable)¶
Ring R_q = Z_q[X]/(X^256 + 1), q = 8380417 = 2^23 - 2^13 + 1, n = 256.
ML-DSA-87 (Dilithium5): (k, ℓ) = (8, 7), η = 2, τ = 60, β = τη = 120,
γ1 = 2^19 = 524288, γ2 = (q-1)/32 = 261888, ω = 75, d = 13.
Key. ρ is a 32-byte seed. The matrix is A = ExpandA(ρ) ∈ R_q^{8×7}, where
ExpandA is a SHAKE-based hash expansion (this single fact is load-bearing —
see §3). Secret s1 ∈ R_q^7, s2 ∈ R_q^8, both with ‖·‖∞ ≤ η.
t = A·s1 + s2; split t = t1·2^d + t0. Public key pk = (ρ, t1).
Sign(μ). Sample y, ‖y‖∞ < γ1. w = A·y; w1 = HighBits(w, 2γ2).
c̃ = H(μ ‖ w1), c = SampleInBall(c̃) (sparse, exactly τ coeffs in ±1).
z = y + c·s1. Reject unless ‖z‖∞ < γ1 − β and the low-bits/hint conditions hold.
h = MakeHint(...). Signature σ = (c̃, z, h).
Verify(pk, μ, σ). A = ExpandA(ρ), c = SampleInBall(c̃),
w' = A·z − c·t1·2^d, w1' = UseHint(h, w', 2γ2). Accept iff
(V1) ‖z‖∞ < γ1 − β
(V2) c = SampleInBall(c̃) is sparse ±1, weight τ
(V3) c̃ = H( μ ‖ w1' ) (Fiat–Shamir fixed point)
(V4) #ones(h) ≤ ω
The identity that makes honest signatures verify:
w' = A(y+cs1) − c t1 2^d = w + c(t − s2) − c t1 2^d = w − c·s2 + c·t0, and
UseHint strips the c·t0 term so w1' = HighBits(w − c s2) = HighBits(w) = w1.
Note w1 is fixed by w = A·y alone, before c exists. That ordering is the
whole game.
1. Track B — backward from the verifier¶
For the standard verifier to accept an aggregate (c̃*, z*, h*) under some
pk* = (ρ*, t1*) and message μ*, all of (V1)–(V4) must hold for A* = ExpandA(ρ*).
Rearrange (V3): the aggregator must exhibit c̃* such that
c̃* appears on both sides (right side through c* = SampleInBall(c̃*)). With H
a random oracle, the only way to satisfy this is to fix everything that feeds the hash
before querying it — i.e. fix w1* first, then define c̃* = H(μ* ‖ w1*). But w1*
is derived from w'* = A*z* − c* t1* 2^d, which depends on c*, which is
SampleInBall(c̃*). The loop closes only if there is a way to choose z* after
c* is known that (a) keeps ‖z*‖∞ < γ1−β and (b) leaves HighBits(w'*) equal to the
pre-committed w1*. In honest signing this is done by z = y + c·s1: the secret s1
absorbs c while rejection sampling keeps z short. A public aggregator holds no
s1* for pk*. So Track B reduces to: produce (z*, c*) short and FS-consistent
for a key whose secret you don't have. That is exactly the ML-DSA forgery problem for
pk* (§3 makes the reduction precise).
This is not lattice-specific. The identical wall exists for classical Schnorr:
nobody can publicly combine independent Schnorr signatures into one Schnorr signature,
because Schnorr is also Fiat–Shamir and s = k + e·x needs the secret x. BLS
escapes only because it is not Fiat–Shamir — it is pairing-based and deterministic
(σ = x·H(m)), so Σσ_i is itself a signature under Σ pk_i. The property the
project wants is a pairing property, and ML-DSA is a Fiat–Shamir scheme. So "no one
did this before" is not a gap in cleverness; FS signatures structurally do not
self-aggregate, classically or post-quantum.
2. Track A — forward from known mechanisms¶
(a) Random-linear-combination batch verification. With hash-derived small e_i,
Σ_i e_i ( A_i z_i − c_i t1_i 2^d ) = Σ_i e_i w_i' holds and Schwartz–Zippel/forking
gives soundness. This compresses verifier work, not size: every A_i, z_i, c_i,
h_i is still needed. Each A_i multiplies its own z_i; there is no single matrix.
(b) The short vector is real. z_agg = Σ_i e_i z_i with bounded-weight e_i is
genuinely short: ‖z_agg‖∞ ≤ (Σ_i ‖e_i‖_1)·γ1, linear in M. Never disputed.
(c) Folding with cross-term commitments (Bulletproofs/LaBRADOR style). When you
collapse toward one object you incur cross terms (§3). The standard fix commits the
cross terms with short Module-SIS commitments (L/R), the verifier subtracts the
openings, completeness holds, and forking on the challenges extracts the leaf
witnesses. What this builds is a self-contained PoK with a relaxed, ML-DSA-shaped
verifier — its own check (linear relation + relaxed norm + hash-binding of the
residual). It is NOT the standard verifier: the extra openings are inputs the FIPS-204
verifier has no slot for, and (V1)/(V2) are not met (see §3). This is the strongest
object public, non-interactive aggregation of independent sigs can reach.
3. The two walls, derived (and why the transform domain cannot move them)¶
Wall 1 — the matrix is pinned to a seed by a hash¶
To use the standard verifier you must hand it pk* = (ρ*, t1*); it then computes
A* = ExpandA(ρ*). So A* is forced to be a hash expansion of one 32-byte seed.
You cannot set A* = Σ_i e_i A_i, because that is not in the image of ExpandA (it is
a SHAKE output). The only way several signers can share matrix structure under the
standard verifier is to share one ρ (one global A). Distinct keys ⇒ distinct
ExpandA(ρ_i) ⇒ no single A* respects the products A_i z_i.
The cross terms make this concrete. Even ignoring Wall 1, forming A_agg = Σ_i e_i A_i
and z_agg = Σ_j e_j z_j:
A_agg · z_agg = Σ_{i,j} e_i e_j (A_i z_j)
= Σ_i e_i^2 (A_i z_i) + Σ_{i≠j} e_i e_j (A_i z_j)
└ the part you want ┘ └ off-diagonal garbage ┘
The off-diagonal A_i z_j mix one signer's matrix with another's response; they are
full-norm and do not cancel. (Same failure in (Σe_i c_i)(Σe_i t1_i) vs the needed
Σ e_i c_i t1_i.)
Does NTT / the convolution theorem dissolve this? No — and this is worth being
exact about, since it was the motivating hope. NTT is a ring isomorphism
R_q ≅ (F_q)^256 (or over an extension to get all roots), and it is linear. It
turns one convolution into pointwise products — wonderful for speed (this is exactly
why ML-DSA and FN-DSA compute in the transform domain). But per NTT slot k:
NTT(A_agg)[k] · NTT(z_agg)[k]
= ( Σ_i e_i[k] A_i[k] ) ( Σ_j e_j[k] z_j[k] )
= Σ_{i,j} e_i[k] e_j[k] A_i[k] z_j[k] ← cross terms in EVERY slot
A linear, invertible change of basis cannot remove a bilinear mismatch: the map
(A_i, z_j) ↦ A_i z_j is quadratic in the data, and the obstruction is that we are
trying to express a sum of M distinct products as a single product. The convolution
theorem (its continuous cousin is the Laplace transform turning f∗g into F·G) helps
when you have one convolution to diagonalize; it does nothing when the problem is
that you have Σ_i A_i ∗ z_i and want A_agg ∗ z_agg. The control-theory framing
("treat each signer as a plant, design a compensator") is mathematically the same
linear-systems view and runs into the same fact: superposition holds within one LTI
channel, but the M signers are M different channels (different A_i), and no input
filter built from public data makes Σ_i A_i z_i equal A_agg z_agg for a single
seed-consistent A_agg. Transform domain is the right implementation tool, not a
soundness-barrier breaker.
Wall 2 — norm growth vs. a fixed acceptance radius¶
Even with a shared A (Wall 1 satisfied), summing responses grows the norm:
z_agg = Σ y_i + c·Σ s1_i has ‖Σ y_i‖∞ up to ~M·γ1, which violates (V1)
‖z‖∞ < γ1 − β for M ≥ 2. Schnorr/BLS have no norm bound, so their aggregates
stay valid; lattice FS-with-aborts has one, so aggregation either (a) enlarges γ1
(then it is an ML-DSA verifier with different parameters, not the deployed
ML-DSA-87), or (b) uses interaction + careful joint rejection sampling to keep the sum
in range for small M. This wall is pure arithmetic and, again, transform-invariant
(‖·‖∞ is not simplified by NTT).
The reduction (so this is a theorem, not an assertion)¶
A standard-verifier-accepting (c̃*, z*, h*) under (ρ*, t1*), μ* is, in the ROM,
a short z* plus sparse c* with c̃* = H(μ* ‖ HighBits(A* z* − c* t1* 2^d)) for
A* = ExpandA(ρ*). For an honestly-derived ρ* (any combination of real seeds run
through a hash), A* is computationally MSIS-hard. Producing such (z*, c*) for a
fresh μ* without the trapdoor is precisely EUF-CMA forgery against pk*; embedding
the EUF-CMA challenge key as one component of the derivation turns any public,
non-interactive, distinct-key aggregator with this property into an ML-DSA forger.
Hence:
Trilemma (derived). For aggregation of independently produced, distinct-key ML-DSA signatures you can have at most two of: (1) output accepted by the standard ML-DSA-87 verifier, (2) aggregation public & non-interactive, (3) unforgeability. The forbidden corner is
(1)+(2)+(3)— it yields an MSIS solver.
What a genuine break would require (the exact lock to pick, if one insists): a public
map sending {short z_i : A_i z_i ≈ u_i} over distinct A_i = ExpandA(ρ_i) to a
single short z* with ExpandA(ρ*) z* ≈ u* and an FS-consistent sparse c* — i.e. a
norm-preserving homomorphism across independent hash-expanded matrices. That object
would itself break the MSIS/ROM structure ML-DSA's own security rests on. So the wall
is not borrowed from literature; it is the same assumption that makes ML-DSA a
signature at all.
4. What this BUYS us — the constructive results¶
The trilemma says drop one corner. Each drop is a real, buildable scheme.
Path A (drop 1): public, non-interactive, unforgeable — relaxed ML-DSA-shaped verifier¶
This is LMSA-as-PoK: small-coefficient folding (z_agg = Σ e_i z_i) + per-level
Module-SIS cross-term commitments + hash-binding of the residual to the batch S.
Self-contained (extractor → leaf witnesses, no fraud window), recursively
re-aggregatable, PQ via Module-SIS, ~tens of KB. Verifier is structurally ML-DSA
(linear check → high-bits reconstruction → hash binding) but a thin new routine with
relaxed norm and the openings. Best for: arbitrary distinct-key, distinct-message,
already-signed transactions.
Path B (drop 2): exact-shaped verifier under an aggregate key — shared-A lattice multisignature¶
Make A = ExpandA(ρ) a global system parameter (one ρ for the whole of QRL 2.0).
Then Wall 1 is gone. Signers run an interactive (2–3 round) protocol on a common
message: agree on joint commitment w = A·Σy_i, derive one challenge
c = H(μ ‖ HighBits(w)) (sparse — satisfies V2/V3!), each returns z_i = y_i + c s1_i,
and the aggregate is z* = Σ z_i = Σy_i + c·(Σ s1_i) under aggregate key
t1* = HighBits(Σ t_i), secret s1* = Σ s1_i. This is a genuine response to a
sparse challenge — (V2),(V3) hold by construction. Remaining cost is Wall 2: ‖z*‖∞
grows with the number of signers, so you need joint rejection sampling and, for large
cohorts, an enlarged γ1 (⇒ an ML-DSA verifier of a new parameter set, not byte-
identical FIPS-204). Best for: many validators signing the same block / epoch
(consensus) — the exact niche BLS fills in Ethereum, and where interaction is already
present. Satisfies "capabilities strictly derived from the keys" literally (s1* =
Σ s1_i; n-of-n grants no new power) and "the aggregate key is derived from and bounded
by the component keys."
Path C (drop 2, non-interactive flavor): synchronized aggregation¶
If all signers share A and a per-slot common challenge derivation (e.g. one message
per round/epoch, c from the round's public data), responses can be combined with
small e_i without a live protocol, at the cost of the synchronization assumption and
the same Wall-2 norm budget. A middle point between A and B.
5. Mapping to QRL 2.0 (recommendation)¶
- Consensus / validator set signing the same block → Path B (or C): a shared-
Alattice multisignature is the true PQ analog of BLS-in-consensus, and the aggregate is verified by an ML-DSA-shaped verifier under an aggregate key. Decide explicitly whether "ML-DSA verifier" means the algorithm at adjusted params (achievable) or the exact FIPS-204 ML-DSA-87 binary (only for tiny cohorts within the norm slack). - Arbitrary user transactions (distinct keys, distinct messages, already signed) → Path A: relaxed-verifier folding PoK. No interaction, succinct, recursive.
- A system can ship both: B for the validator layer, A for the tx layer.
6. "Can the proof be used as a key?"¶
The aggregate (residual + commitments) is a capability object whose power is exactly
the collective signing ability of the inputs and no more — it re-enters the next fold
(Path A) or is the aggregate key/signature (Path B), but it cannot sign anything the
original set could not jointly sign. It is not fresh secret-key material: public
aggregation yields combined responses/commitments, never new trapdoors. In Path B,
the derived s1* = Σ s1_i is a real aggregate secret, but it exists only as the
distributed sum across the signers — no single party (and no public aggregator) holds
it. That is the precise, sound reading of "used as a key": an aggregate public key
under which only the cohort, acting together, can sign.
7. Immediate next artifacts¶
- Path A folding step, written out over
R_q: exact relaxed relation, the cross-term (L/R) Module-SIS commitment, per-level norm accounting (ε-weight, fan-in), extraction sketch, and the standalone relaxed verifier. → Rust prototype proving honest completeness + measuring norms/sizes. - Path B protocol, written out: shared-
Aparameter generation, the 2–3 round joint-signing with norm-controlled rejection sampling, aggregate-key derivation, and the exact verifier parameters needed (and the max cohort size that stays inside ML-DSA-87's nativeγ1). - Fold in background-research concrete sizes (LaBRADOR cross-term costs, lattice multisig norm/round bounds) when the workflow lands.