AI triage Local LLM, OpenAI-compatible API. Output schema enforces IC analytic tradecraft.

What it does

For every finding above a configurable severity threshold, digger sends the finding and its referenced artifacts to a local LLM and parses the response into a structured triage record. The record is written into the finding's triage_json column (the chain hash does not cover it — see evidence store).

A second pass produces a case-wide executive summary aggregated from the per-finding judgments. That summary is stored in case metadata and rendered prominently at the top of the HTML report.

Source: digger/ai/triage.py, digger/ai/prompts.py, digger/ai/llama_client.py.

Compatible backends

Anything that speaks POST /v1/chat/completions in OpenAI's format. The base URL is set with --llm-base-url or DIGGER_LLM_BASE_URL.

llama.cpp: llama-server at http://127.0.0.1:8080/v1. Use --jinja for proper chat templates.
ollama: ollama serve at http://127.0.0.1:11434/v1.
vllm: OpenAI-compatible by default at http://127.0.0.1:8000/v1.
OpenAI / Anthropic / others: Technically work, but defeat the local-only design.

The output contract

The triage prompt requires the LLM to respond as a JSON object with specific fields. Wherever the server supports it (llama.cpp grammar, OpenAI JSON-schema mode), digger forces structured output via response_format. If parsing fails it salvages by hunting matching braces.

Required fields for each finding:

Field	Type	Notes
`verdict`	enum	`false_positive` · `likely_benign` · `needs_investigation` · `likely_malicious` · `confirmed_malicious`
`estimative_probability`	enum	IC seven-step ladder (ICD 203). See tradecraft.
`analytic_confidence`	enum	`low` · `moderate` · `high` — distinct from the probability itself
`source_reliability`	enum	NATO Admiralty letter A-F
`info_credibility`	enum	NATO Admiralty digit 1-6
`tlp`	enum	CLEAR / GREEN / AMBER / AMBER+STRICT / RED
`severity`	enum	Reassessed severity, may differ from the detector's call
`one_line`	str	≤120 char ticket headline
`rationale`	str	2-5 sentences. Sources vs. inferences explicitly separated.
`assumptions`	list[str]	Explicit assumptions the judgment depends on (ICD 203 §I.B)
`alternative_hypotheses`	list[str]	At least 2 competing explanations including benign ones (Heuer SAT 5 / ACH)
`next_steps`	list[str]	Ordered concrete investigative actions
`attribution`	str?	Named threat-actor or family if multiple signals converge; else null
`iocs`	obj	`{sha256, ipv4, domain, url, path}` extracted from evidence
`mitre_attack`	list[str]	Relevant ATT&CK technique IDs
`compliance_impact`	list[str]	Control families implicated (e.g. `NIST 800-53 SI-4`)

Why IC tradecraft?

An unconstrained LLM will produce confidently-worded but uncalibrated analysis. The schema forces three separations the IC analytic standards require:

Probability vs. confidence. "Likely malicious with low analytic confidence" is meaningfully different from "roughly even chance with high confidence." Both are honest answers; conflating them isn't.
Source vs. inference. The rationale must distinguish what was observed (artifact content) from what is inferred (judgment). The source_reliability grade tracks the former, the analytic_confidence tracks the latter.
Hypothesis competition. Listing at least two competing explanations defends against confirmation bias and surfaces the evidence that would discriminate between them (Heuer's Analysis of Competing Hypotheses).

The case-wide summary

After per-finding triage, a second prompt sees only the consolidated verdicts and produces:

overall_severity: The case's effective severity
overall_estimative_probability: How likely the host is currently compromised, on the IC ladder
overall_confidence: Low / moderate / high
tlp: Default TLP for the case
one_paragraph: Executive prose (≤1500 chars)
key_judgments: 3-5 most consequential calls, each with a probability label
assumptions / alternative_explanations: As above, case-wide
top_actions: 3-7 next actions
if_compromised: What to do FIRST if the worst finding is real
attribution_hint: Named adversary if signals converge
iocs_to_share: Consolidated IOC dict (respect TLP when sharing)
compliance_implications: Frameworks/controls implicated

What gets sent to the LLM

Per-finding prompt = the host fingerprint + the finding's detector / severity / title / summary / MITRE / evidence + up to 5 referenced artifacts (truncated to 6 KB each). No raw file contents unless they were already captured as part of an artifact.

The case-summary prompt sees only the host + the consolidated per-finding triage records, not the underlying artifacts.

Note. If you send digger to a remote LLM, your case artifacts go to that endpoint. Use a local model for sensitive cases.

Calibrating the run

Flag	Default	Use
`--skip-below`	`low`	Set to `medium` for quick triage of only the suspicious stuff.
`--max N`	—	Cap total findings sent. Useful when iterating on prompts/models.
`--only DETECTORS`	—	Triage only findings from named detectors.
`--no-case-summary`	—	Skip the case-wide pass (save tokens).
`--force`	—	Continue past LLM health-check failure.
`DIGGER_LLM_TEMPERATURE`	`0.2`	Lower for stricter calibration; rarely raise above 0.4.
`DIGGER_LLM_MAX_TOKENS`	`1024`	Raise if the model truncates long rationales.

Cost & latency

One HTTP request per non-skipped finding plus one for the case summary. A 24-finding case at temp 0.2 on a 14B Q4 model averages ~30 s end-to-end on a recent Apple Silicon Mac. Throughput is dominated by the model's prefill speed since each prompt is independent — running the LLM with larger batch size doesn't help much.