Evidence store SQLite, append-only, paired SHA-256 + SHA3-256 hash chain, optionally PQC-signed.

What it is

One file: evidence.db. SQLite. Every artifact and finding is a row. Every row carries two independent content hashes (data_sha256 and data_sha3_256) and two independent chain hashes (chain_sha256 and chain_sha3_256) that thread through the table in parallel. Any post-hoc modification breaks both chains from that point forward, and any subsequent PQC signature over the tip is invalid.

Why two algorithms in parallel?

Defense in depth. SHA-256 (FIPS 180-4) and SHA3-256 (FIPS 202) use structurally independent constructions — Merkle-Damgård vs. Keccak sponge. A future cryptanalytic break against one family is unlikely to break the other. To forge undetectable tampering, an attacker would need to find a colliding-content modification in both algorithms simultaneously.

SHA-256 stays in place for ecosystem interoperability — VirusTotal, MalwareBazaar, signature-base, IOC feeds, git, sigstore: every external consumer of evidence hashes speaks SHA-256. SHA3-256 hardens digger's own integrity layer without losing that compatibility.

The PQC signature emitted by digger pqc sign covers the chain tip JSON which includes both digests + the algorithm list, so a single signature attests to both chains.

Source: digger/core/evidence.py.

Tables

artifacts

ColumnNotes
idAuto-increment row id (NOT the artifact UUID)
artifact_uuidUUIDv4, unique. Used as the foreign-key target for findings.
collectorThe collector's name (e.g. processes, macos.launchd)
categoryCoarse grouping (process, persistence, network, …)
subjectHuman-readable subject (pid=312 chrome, HKLM\…\Run)
tsWall-clock timestamp when the artifact was collected
data_jsonCanonical JSON serialization of the data dict
data_sha256SHA-256 of collector|category|subject|data_json
data_sha3_256SHA3-256 of the same payload
chain_sha256SHA-256 of prev_chain_sha256 || data_sha256
chain_sha3_256SHA3-256 of prev_chain_sha3_256 || data_sha3_256

findings

ColumnNotes
finding_uuidUUIDv4, unique
detectorDetector name that emitted it
severityEnum: info low med high crit
title / summaryHeadline + 2-5 sentence explanation
artifact_refsJSON array of artifact UUIDs implicated
evidence_jsonFree-form evidence dict
mitrePrimary MITRE ATT&CK technique ID (e.g. T1059.001)
data_sha256 / data_sha3_256Paired content hashes (same scheme as artifacts)
chain_sha256 / chain_sha3_256Paired chain hashes (same scheme as artifacts)
triage_jsonThe AI triage output (NULL if not yet triaged). NOT covered by the chain.

case_meta

Simple key/value store for case-level metadata: case_id, host fingerprint, classification, tlp, ai_case_summary, ai_triage_run, etc.

files

Index of preserved evidence files (path, size, SHA-256, link to owning artifact).

log

Append-only operational log written by collectors and detectors during a run.

The hash chain

For every row, we compute:

chain_sha256[n]   = SHA-256  ( chain_sha256[n-1]   || data_sha256[n]   )
chain_sha3_256[n] = SHA3-256 ( chain_sha3_256[n-1] || data_sha3_256[n] )

where content_hash[n] is the SHA-256 of the row's canonical content. The very first row uses an empty prev-hash (zero-length input prefix).

store.verify_chain() recomputes both the content hash and the chain hash for every row. Any divergence is recorded in result["errors"]:

$ digger verify --case-dir ./case-1
{
  "artifacts_ok": { "sha256": true, "sha3_256": true, "all": true },
  "findings_ok":  { "sha256": true, "sha3_256": true, "all": true },
  "errors": []
}

The CLI exits non-zero if any chain check fails.

Tamper detection in practice

Suppose someone edits an artifact's data_json directly with sqlite3 to scrub evidence of a malicious process. The next call to verify_chain() will report:

artifact id=42 content hash mismatch
artifact id=43 chain hash mismatch     ← cascades to every later row
artifact id=44 chain hash mismatch
...

The cascade is the important part. You can identify the precise row where tampering began.

Signing the chain tip

For long-term integrity, sign the tip with a NIST PQC algorithm:

digger pqc sign --case-dir ./case-1 \
       --algorithm ML-DSA-65 \
       --key /secrets/digger.sk    # auto-generated if missing

This writes case_signature.json containing the signed payload (the concatenated artifacts + findings tips and the case_id), the algorithm OID, and the public key. To verify later:

digger pqc verify --case-dir ./case-1

Defaults to ML-DSA-65 (FIPS 204, NIST Level 3). See post-quantum crypto for the full algorithm catalog.

What's not in the chain

Working with the DB directly

If you want to slice the data outside digger, the schema is standard SQLite. Example queries:

# Every high-or-critical finding with its detector and rationale
sqlite3 case-1/evidence.db <<'SQL'
SELECT severity, detector, title FROM findings
WHERE severity IN ('high', 'critical')
ORDER BY id;
SQL

# Every process artifact with its parent
sqlite3 case-1/evidence.db <<'SQL'
SELECT json_extract(data_json, '$.pid'),
       json_extract(data_json, '$.ppid'),
       json_extract(data_json, '$.name'),
       json_extract(data_json, '$.cmdline')
FROM artifacts WHERE collector='processes' LIMIT 20;
SQL

Read-only access is safe. Writes are not — they break the chain.