Architecture One evidence store, many components. Every module reads or writes it; nothing communicates out-of-band.

Big picture

digger pipeline: collectors produce artifacts into a hash-chained SQLite evidence store; detectors query the store and write findings; AI triage augments findings; outputs include HTML/MD/JSON reports, STIX/MISP/ATT&CK Navigator exports, and per-framework compliance reports

Three rules govern how the parts fit together:

Append-only. Once an artifact or finding is written it is never modified. Updates are forbidden by the hash chain (any modification invalidates downstream hashes and any PQC signature over the chain tip). The single exception is the per-finding triage_json column, which is populated by the AI triage step and not covered by the chain.
Single source of truth. Detectors don't talk to collectors directly. AI triage doesn't see raw OS state. Reports don't re-collect. Everything operates on rows in the evidence DB. This lets you run any stage offline against an existing case directory.
Algorithm-agile, data-driven. Detector behavior is shaped by YAML rule files (digger/rules/) and live cached feeds (digger/intel/). Compliance behavior is shaped by YAML control catalogs (digger/compliance/frameworks/). PQC algorithms come from what liboqs exposes at runtime, not a hard-coded list.

Module layout

Module	Role	Talks to
`digger/core/`	Evidence store, base Collector class, runner, hashing, platform detection	SQLite
`digger/collectors/`	~30 platform-aware artifact collectors	OS APIs (psutil), CLI tools, plist/registry
`digger/detectors/`	32 detectors over artifacts: behavioral, YARA, IOC, Sigma, C2, supply-chain, threat-actor, service-version CVE, plus 12 Decepticon countermeasures, timeline	Evidence store, `digger/rules/`, `digger/intel/`
`digger/firewall/`	Unified pf / nftables / iptables / ufw / firewalld / WFP audit + remediation	Evidence store, `digger.ethics.contract`
`digger/ethics/`	Codified ethical contract (10 principles, programmatically enforced) + pre-engagement scope attestation	Used by every state-modifying feature
`digger/intel/`	15 live threat-intel feeds + background scheduler + composite multi-URL fetchers (NVD, SigmaHQ, MITRE ATT&CK STIX)	HTTP (CISA, abuse.ch, Spamhaus, GitHub, OpenSSF, NVD, mitre-attack, sigmahq, …)
`digger/ai/`	OpenAI-compatible client, ICD-203-compliant triage prompts and schema	HTTP (llama.cpp / ollama / vllm)
`digger/crypto/`	liboqs-backed NIST PQC; sign, verify, hybrid PQC-KEM + AES-256-GCM	oqs-python, cryptography
`digger/fips/`	FIPS 140-3 mode + KAT self-test + algorithm gating	—
`digger/compliance/`	18 framework catalogs + control assessor + reports	Evidence store
`digger/tradecraft/`	ICD 203 estimative probability, NATO Admiralty, TLP, ACH	—
`digger/exchange/`	STIX 2.1, MISP, ATT&CK Navigator, TAXII 2.1, Sigma loader	HTTP for TAXII; pure Python otherwise
`digger/coc/`	ISO/IEC 27037 + NIST SP 800-86 chain-of-custody record	JSON sidecar file
`digger/report/`	JSON, Markdown, self-contained HTML reports	Evidence store
`digger/cli.py`	argparse-driven entry point; sub-commands are thin wrappers over the modules above	All of the above

Pipeline stages

Collect

The runner pulls the list of collectors appropriate to the current OS from digger/collectors/__init__.py:all_collectors(). For each, it calls Collector.run() which:

Checks supported_os and requires_admin; logs a skip if not satisfied.
Iterates the collector's collect() generator, calling store.add_artifact() for each Artifact.
Catches any unexpected exception, logs it, and continues.

The runner also writes case metadata, opens a chain-of-custody record, and appends collection_started / collection_finished events.

Scan

Detectors implement detect(store) -> Iterable[Finding]. They read artifacts back from the store via store.iter_artifacts(collector=...) and emit Findings, each tagged with severity, MITRE ATT&CK technique, artifact references, and free-form evidence dict.

Detectors are independent — order in the registry doesn't matter except TimelineBuilder, which runs last and consumes other findings to synthesize a chronological event view.

Triage

TriageRunner walks the findings (above a configurable severity threshold) and POSTs each plus its referenced artifacts to the LLM. The response must conform to a JSON schema enforced by the prompt and (where the server supports it) by structured-output. The triage payload is stored in findings.triage_json. Finally, a case-wide executive summary is generated and persisted to case metadata.

The schema requires IC-grade outputs: estimative probability, analytic confidence, source/info reliability, TLP marking, assumptions, alternative hypotheses.

Report / export

Reports and exports never re-derive evidence. They only read the SQLite store and the case_meta table.

JSON: Full structured dump for downstream tooling.
Markdown: Print-friendly, embeddable in tickets.
HTML: Self-contained, embedded SVG, severity-filtered finding cards.
STIX 2.1: Bundle with incident, indicators, attack-patterns, TLP marking definitions.
MISP: Event JSON with MITRE ATT&CK galaxy tags.
ATT&CK Navigator: Layer JSON for the matrix viewer.
Compliance: Per-framework JSON/MD/HTML with pass/fail/manual/partial per control.

What does not happen automatically

Some design choices are deliberate non-features:

No cloud telemetry. digger never phones home, never sends the case anywhere. Intel feed fetches are the only outbound traffic, and they're explicit opt-in (you have to run digger intel update or digger intel watch).
No automatic remediation. Findings describe what was observed and recommend next steps. They never modify the host. Killing processes, removing files, or quarantining is your call.
No live blocking. digger is a forensic snapshot tool, not an EDR agent. It does not hook syscalls or inject into kernel space. Run it periodically or on suspicion.
No automatic Internet IOC enrichment of every artifact. IOCs from cached feeds are matched against artifacts locally. We do not submit hashes / URLs / IPs to VirusTotal or similar by default.

Dependency graph

cli.py
  ├── core.runner         core.evidence, coc.record, collectors.*, fips
  ├── detectors.*         core.evidence, detectors._rules_io ←─ rules/*, intel/feeds
  ├── ai.triage           ai.llama_client, ai.prompts, core.evidence
  ├── crypto.pqc          fips.mode (gating)
  ├── compliance.assessor compliance.frameworks/*.yaml, core.evidence
  ├── exchange.*          tradecraft.tlp, core.evidence
  └── report.*            assets, core.evidence

There are no circular imports. Modules under digger/compliance/, digger/exchange/, digger/intel/, and digger/ai/ can each be used standalone if you import them directly, without ever touching the collector or detector code.

Architecture One evidence store, many components. Every module reads or writes it; nothing communicates out-of-band.

Big picture

Module layout

Pipeline stages

Collect

Scan

Triage

Report / export

What does not happen automatically

Dependency graph

Further reading

Evidence store →

Collectors →

Detectors →

Extending →