Evidence store SQLite, append-only, paired SHA-256 + SHA3-256 hash chain, optionally PQC-signed.
What it is
One file: evidence.db. SQLite. Every artifact and finding is
a row. Every row carries two independent content hashes
(data_sha256 and data_sha3_256) and two
independent chain hashes (chain_sha256 and
chain_sha3_256) that thread through the table in parallel.
Any post-hoc modification breaks both chains from that
point forward, and any subsequent PQC signature over the tip is invalid.
Why two algorithms in parallel?
Defense in depth. SHA-256 (FIPS 180-4) and SHA3-256 (FIPS 202) use structurally independent constructions — Merkle-Damgård vs. Keccak sponge. A future cryptanalytic break against one family is unlikely to break the other. To forge undetectable tampering, an attacker would need to find a colliding-content modification in both algorithms simultaneously.
SHA-256 stays in place for ecosystem interoperability — VirusTotal, MalwareBazaar, signature-base, IOC feeds, git, sigstore: every external consumer of evidence hashes speaks SHA-256. SHA3-256 hardens digger's own integrity layer without losing that compatibility.
The PQC signature emitted by digger pqc sign covers the
chain tip JSON which includes both digests + the algorithm list, so a
single signature attests to both chains.
Source: digger/core/evidence.py.
Tables
artifacts
| Column | Notes |
|---|---|
id | Auto-increment row id (NOT the artifact UUID) |
artifact_uuid | UUIDv4, unique. Used as the foreign-key target for findings. |
collector | The collector's name (e.g. processes, macos.launchd) |
category | Coarse grouping (process, persistence, network, …) |
subject | Human-readable subject (pid=312 chrome, HKLM\…\Run) |
ts | Wall-clock timestamp when the artifact was collected |
data_json | Canonical JSON serialization of the data dict |
data_sha256 | SHA-256 of collector|category|subject|data_json |
data_sha3_256 | SHA3-256 of the same payload |
chain_sha256 | SHA-256 of prev_chain_sha256 || data_sha256 |
chain_sha3_256 | SHA3-256 of prev_chain_sha3_256 || data_sha3_256 |
findings
| Column | Notes |
|---|---|
finding_uuid | UUIDv4, unique |
detector | Detector name that emitted it |
severity | Enum: info low med high crit |
title / summary | Headline + 2-5 sentence explanation |
artifact_refs | JSON array of artifact UUIDs implicated |
evidence_json | Free-form evidence dict |
mitre | Primary MITRE ATT&CK technique ID (e.g. T1059.001) |
data_sha256 / data_sha3_256 | Paired content hashes (same scheme as artifacts) |
chain_sha256 / chain_sha3_256 | Paired chain hashes (same scheme as artifacts) |
triage_json | The AI triage output (NULL if not yet triaged). NOT covered by the chain. |
case_meta
Simple key/value store for case-level metadata: case_id,
host fingerprint, classification, tlp,
ai_case_summary, ai_triage_run, etc.
files
Index of preserved evidence files (path, size, SHA-256, link to owning artifact).
log
Append-only operational log written by collectors and detectors during a run.
The hash chain
For every row, we compute:
chain_sha256[n] = SHA-256 ( chain_sha256[n-1] || data_sha256[n] )
chain_sha3_256[n] = SHA3-256 ( chain_sha3_256[n-1] || data_sha3_256[n] )
where content_hash[n] is the SHA-256 of the row's canonical
content. The very first row uses an empty prev-hash (zero-length input
prefix).
store.verify_chain() recomputes both the content hash and the
chain hash for every row. Any divergence is recorded in
result["errors"]:
$ digger verify --case-dir ./case-1
{
"artifacts_ok": { "sha256": true, "sha3_256": true, "all": true },
"findings_ok": { "sha256": true, "sha3_256": true, "all": true },
"errors": []
}
The CLI exits non-zero if any chain check fails.
Tamper detection in practice
Suppose someone edits an artifact's data_json directly with
sqlite3 to scrub evidence of a malicious process. The next call to
verify_chain() will report:
artifact id=42 content hash mismatch
artifact id=43 chain hash mismatch ← cascades to every later row
artifact id=44 chain hash mismatch
...
The cascade is the important part. You can identify the precise row where tampering began.
Signing the chain tip
For long-term integrity, sign the tip with a NIST PQC algorithm:
digger pqc sign --case-dir ./case-1 \
--algorithm ML-DSA-65 \
--key /secrets/digger.sk # auto-generated if missing
This writes case_signature.json containing the signed payload
(the concatenated artifacts + findings tips and the case_id), the algorithm
OID, and the public key. To verify later:
digger pqc verify --case-dir ./case-1
Defaults to ML-DSA-65 (FIPS 204, NIST Level 3). See
post-quantum crypto for the full algorithm catalog.
What's not in the chain
triage_json— AI triage output is written after the fact and is not part of the hash chain. This is intentional: it lets you re-run triage with a different model without invalidating the forensic chain. If you want to attest to a specific triage run, snapshot the case directory and sign the snapshot.case_meta,files,log— these change during normal operation (e.g. when you re-render reports) and aren't load-bearing for forensic integrity.
Working with the DB directly
If you want to slice the data outside digger, the schema is standard SQLite. Example queries:
# Every high-or-critical finding with its detector and rationale
sqlite3 case-1/evidence.db <<'SQL'
SELECT severity, detector, title FROM findings
WHERE severity IN ('high', 'critical')
ORDER BY id;
SQL
# Every process artifact with its parent
sqlite3 case-1/evidence.db <<'SQL'
SELECT json_extract(data_json, '$.pid'),
json_extract(data_json, '$.ppid'),
json_extract(data_json, '$.name'),
json_extract(data_json, '$.cmdline')
FROM artifacts WHERE collector='processes' LIMIT 20;
SQL
Read-only access is safe. Writes are not — they break the chain.