Evidence store SQLite, append-only, paired SHA-256 + SHA3-256 hash chain, optionally PQC-signed.

What it is

One file: evidence.db. SQLite. Every artifact and finding is a row. Every row carries two independent content hashes (data_sha256 and data_sha3_256) and two independent chain hashes (chain_sha256 and chain_sha3_256) that thread through the table in parallel. Any post-hoc modification breaks both chains from that point forward, and any subsequent PQC signature over the tip is invalid.

Why two algorithms in parallel?

Defense in depth. SHA-256 (FIPS 180-4) and SHA3-256 (FIPS 202) use structurally independent constructions — Merkle-Damgård vs. Keccak sponge. A future cryptanalytic break against one family is unlikely to break the other. To forge undetectable tampering, an attacker would need to find a colliding-content modification in both algorithms simultaneously.

SHA-256 stays in place for ecosystem interoperability — VirusTotal, MalwareBazaar, signature-base, IOC feeds, git, sigstore: every external consumer of evidence hashes speaks SHA-256. SHA3-256 hardens digger's own integrity layer without losing that compatibility.

The PQC signature emitted by digger pqc sign covers the chain tip JSON which includes both digests + the algorithm list, so a single signature attests to both chains.

Source: digger/core/evidence.py.

Tables

`artifacts`

Column	Notes
`id`	Auto-increment row id (NOT the artifact UUID)
`artifact_uuid`	UUIDv4, unique. Used as the foreign-key target for findings.
`collector`	The collector's `name` (e.g. `processes`, `macos.launchd`)
`category`	Coarse grouping (`process`, `persistence`, `network`, …)
`subject`	Human-readable subject (`pid=312 chrome`, `HKLM\…\Run`)
`ts`	Wall-clock timestamp when the artifact was collected
`data_json`	Canonical JSON serialization of the data dict
`data_sha256`	SHA-256 of `collector\|category\|subject\|data_json`
`data_sha3_256`	SHA3-256 of the same payload
`chain_sha256`	SHA-256 of `prev_chain_sha256 \|\| data_sha256`
`chain_sha3_256`	SHA3-256 of `prev_chain_sha3_256 \|\| data_sha3_256`

`findings`

Column	Notes
`finding_uuid`	UUIDv4, unique
`detector`	Detector `name` that emitted it
`severity`	Enum: info low med high crit
`title` / `summary`	Headline + 2-5 sentence explanation
`artifact_refs`	JSON array of artifact UUIDs implicated
`evidence_json`	Free-form evidence dict
`mitre`	Primary MITRE ATT&CK technique ID (e.g. `T1059.001`)
`data_sha256` / `data_sha3_256`	Paired content hashes (same scheme as artifacts)
`chain_sha256` / `chain_sha3_256`	Paired chain hashes (same scheme as artifacts)
`triage_json`	The AI triage output (NULL if not yet triaged). NOT covered by the chain.

`case_meta`

Simple key/value store for case-level metadata: case_id, host fingerprint, classification, tlp, ai_case_summary, ai_triage_run, etc.

`files`

Index of preserved evidence files (path, size, SHA-256, link to owning artifact).

`log`

Append-only operational log written by collectors and detectors during a run.

The hash chain

For every row, we compute:

chain_sha256[n]   = SHA-256  ( chain_sha256[n-1]   || data_sha256[n]   )
chain_sha3_256[n] = SHA3-256 ( chain_sha3_256[n-1] || data_sha3_256[n] )

where content_hash[n] is the SHA-256 of the row's canonical content. The very first row uses an empty prev-hash (zero-length input prefix).

store.verify_chain() recomputes both the content hash and the chain hash for every row. Any divergence is recorded in result["errors"]:

$ digger verify --case-dir ./case-1
{
  "artifacts_ok": { "sha256": true, "sha3_256": true, "all": true },
  "findings_ok":  { "sha256": true, "sha3_256": true, "all": true },
  "errors": []
}

The CLI exits non-zero if any chain check fails.

Tamper detection in practice

Suppose someone edits an artifact's data_json directly with sqlite3 to scrub evidence of a malicious process. The next call to verify_chain() will report:

artifact id=42 content hash mismatch
artifact id=43 chain hash mismatch     ← cascades to every later row
artifact id=44 chain hash mismatch
...

The cascade is the important part. You can identify the precise row where tampering began.

Signing the chain tip

For long-term integrity, sign the tip with a NIST PQC algorithm:

digger pqc sign --case-dir ./case-1 \
       --algorithm ML-DSA-65 \
       --key /secrets/digger.sk    # auto-generated if missing

This writes case_signature.json containing the signed payload (the concatenated artifacts + findings tips and the case_id), the algorithm OID, and the public key. To verify later:

digger pqc verify --case-dir ./case-1

Defaults to ML-DSA-65 (FIPS 204, NIST Level 3). See post-quantum crypto for the full algorithm catalog.

What's not in the chain

triage_json — AI triage output is written after the fact and is not part of the hash chain. This is intentional: it lets you re-run triage with a different model without invalidating the forensic chain. If you want to attest to a specific triage run, snapshot the case directory and sign the snapshot.
case_meta, files, log — these change during normal operation (e.g. when you re-render reports) and aren't load-bearing for forensic integrity.

Working with the DB directly

If you want to slice the data outside digger, the schema is standard SQLite. Example queries:

# Every high-or-critical finding with its detector and rationale
sqlite3 case-1/evidence.db <<'SQL'
SELECT severity, detector, title FROM findings
WHERE severity IN ('high', 'critical')
ORDER BY id;
SQL

# Every process artifact with its parent
sqlite3 case-1/evidence.db <<'SQL'
SELECT json_extract(data_json, '$.pid'),
       json_extract(data_json, '$.ppid'),
       json_extract(data_json, '$.name'),
       json_extract(data_json, '$.cmdline')
FROM artifacts WHERE collector='processes' LIMIT 20;
SQL

Read-only access is safe. Writes are not — they break the chain.