AI triage Local LLM, OpenAI-compatible API. Output schema enforces IC analytic tradecraft.
What it does
For every finding above a configurable severity threshold, digger sends the
finding and its referenced artifacts to a local LLM and parses the response
into a structured triage record. The record is written into the finding's
triage_json column (the chain hash does not cover it — see
evidence store).
A second pass produces a case-wide executive summary aggregated from the per-finding judgments. That summary is stored in case metadata and rendered prominently at the top of the HTML report.
Source: digger/ai/triage.py, digger/ai/prompts.py,
digger/ai/llama_client.py.
Compatible backends
Anything that speaks POST /v1/chat/completions in OpenAI's
format. The base URL is set with --llm-base-url or
DIGGER_LLM_BASE_URL.
- llama.cpp
llama-serverathttp://127.0.0.1:8080/v1. Use--jinjafor proper chat templates.- ollama
ollama serveathttp://127.0.0.1:11434/v1.- vllm
- OpenAI-compatible by default at
http://127.0.0.1:8000/v1. - OpenAI / Anthropic / others
- Technically work, but defeat the local-only design.
The output contract
The triage prompt requires the LLM to respond as a JSON object with
specific fields. Wherever the server supports it (llama.cpp grammar,
OpenAI JSON-schema mode), digger forces structured output via
response_format. If parsing fails it salvages by hunting
matching braces.
Required fields for each finding:
| Field | Type | Notes |
|---|---|---|
verdict | enum | false_positive · likely_benign · needs_investigation · likely_malicious · confirmed_malicious |
estimative_probability | enum | IC seven-step ladder (ICD 203). See tradecraft. |
analytic_confidence | enum | low · moderate · high — distinct from the probability itself |
source_reliability | enum | NATO Admiralty letter A-F |
info_credibility | enum | NATO Admiralty digit 1-6 |
tlp | enum | CLEAR / GREEN / AMBER / AMBER+STRICT / RED |
severity | enum | Reassessed severity, may differ from the detector's call |
one_line | str | ≤120 char ticket headline |
rationale | str | 2-5 sentences. Sources vs. inferences explicitly separated. |
assumptions | list[str] | Explicit assumptions the judgment depends on (ICD 203 §I.B) |
alternative_hypotheses | list[str] | At least 2 competing explanations including benign ones (Heuer SAT 5 / ACH) |
next_steps | list[str] | Ordered concrete investigative actions |
attribution | str? | Named threat-actor or family if multiple signals converge; else null |
iocs | obj | {sha256, ipv4, domain, url, path} extracted from evidence |
mitre_attack | list[str] | Relevant ATT&CK technique IDs |
compliance_impact | list[str] | Control families implicated (e.g. NIST 800-53 SI-4) |
Why IC tradecraft?
An unconstrained LLM will produce confidently-worded but uncalibrated analysis. The schema forces three separations the IC analytic standards require:
- Probability vs. confidence. "Likely malicious with low analytic confidence" is meaningfully different from "roughly even chance with high confidence." Both are honest answers; conflating them isn't.
- Source vs. inference. The
rationalemust distinguish what was observed (artifact content) from what is inferred (judgment). Thesource_reliabilitygrade tracks the former, theanalytic_confidencetracks the latter. - Hypothesis competition. Listing at least two competing explanations defends against confirmation bias and surfaces the evidence that would discriminate between them (Heuer's Analysis of Competing Hypotheses).
The case-wide summary
After per-finding triage, a second prompt sees only the consolidated verdicts and produces:
- overall_severity
- The case's effective severity
- overall_estimative_probability
- How likely the host is currently compromised, on the IC ladder
- overall_confidence
- Low / moderate / high
- tlp
- Default TLP for the case
- one_paragraph
- Executive prose (≤1500 chars)
- key_judgments
- 3-5 most consequential calls, each with a probability label
- assumptions / alternative_explanations
- As above, case-wide
- top_actions
- 3-7 next actions
- if_compromised
- What to do FIRST if the worst finding is real
- attribution_hint
- Named adversary if signals converge
- iocs_to_share
- Consolidated IOC dict (respect TLP when sharing)
- compliance_implications
- Frameworks/controls implicated
What gets sent to the LLM
Per-finding prompt = the host fingerprint + the finding's detector / severity / title / summary / MITRE / evidence + up to 5 referenced artifacts (truncated to 6 KB each). No raw file contents unless they were already captured as part of an artifact.
The case-summary prompt sees only the host + the consolidated per-finding triage records, not the underlying artifacts.
Note. If you send digger to a remote LLM, your case artifacts go to that endpoint. Use a local model for sensitive cases.
Calibrating the run
| Flag | Default | Use |
|---|---|---|
--skip-below | low | Set to medium for quick triage of only the suspicious stuff. |
--max N | — | Cap total findings sent. Useful when iterating on prompts/models. |
--only DETECTORS | — | Triage only findings from named detectors. |
--no-case-summary | — | Skip the case-wide pass (save tokens). |
--force | — | Continue past LLM health-check failure. |
DIGGER_LLM_TEMPERATURE | 0.2 | Lower for stricter calibration; rarely raise above 0.4. |
DIGGER_LLM_MAX_TOKENS | 1024 | Raise if the model truncates long rationales. |
Cost & latency
One HTTP request per non-skipped finding plus one for the case summary. A 24-finding case at temp 0.2 on a 14B Q4 model averages ~30 s end-to-end on a recent Apple Silicon Mac. Throughput is dominated by the model's prefill speed since each prompt is independent — running the LLM with larger batch size doesn't help much.