AI triage Local LLM, OpenAI-compatible API. Output schema enforces IC analytic tradecraft.

What it does

For every finding above a configurable severity threshold, digger sends the finding and its referenced artifacts to a local LLM and parses the response into a structured triage record. The record is written into the finding's triage_json column (the chain hash does not cover it — see evidence store).

A second pass produces a case-wide executive summary aggregated from the per-finding judgments. That summary is stored in case metadata and rendered prominently at the top of the HTML report.

Source: digger/ai/triage.py, digger/ai/prompts.py, digger/ai/llama_client.py.

Compatible backends

Anything that speaks POST /v1/chat/completions in OpenAI's format. The base URL is set with --llm-base-url or DIGGER_LLM_BASE_URL.

llama.cpp
llama-server at http://127.0.0.1:8080/v1. Use --jinja for proper chat templates.
ollama
ollama serve at http://127.0.0.1:11434/v1.
vllm
OpenAI-compatible by default at http://127.0.0.1:8000/v1.
OpenAI / Anthropic / others
Technically work, but defeat the local-only design.

The output contract

The triage prompt requires the LLM to respond as a JSON object with specific fields. Wherever the server supports it (llama.cpp grammar, OpenAI JSON-schema mode), digger forces structured output via response_format. If parsing fails it salvages by hunting matching braces.

Required fields for each finding:

FieldTypeNotes
verdictenumfalse_positive · likely_benign · needs_investigation · likely_malicious · confirmed_malicious
estimative_probabilityenumIC seven-step ladder (ICD 203). See tradecraft.
analytic_confidenceenumlow · moderate · high — distinct from the probability itself
source_reliabilityenumNATO Admiralty letter A-F
info_credibilityenumNATO Admiralty digit 1-6
tlpenumCLEAR / GREEN / AMBER / AMBER+STRICT / RED
severityenumReassessed severity, may differ from the detector's call
one_linestr≤120 char ticket headline
rationalestr2-5 sentences. Sources vs. inferences explicitly separated.
assumptionslist[str]Explicit assumptions the judgment depends on (ICD 203 §I.B)
alternative_hypotheseslist[str]At least 2 competing explanations including benign ones (Heuer SAT 5 / ACH)
next_stepslist[str]Ordered concrete investigative actions
attributionstr?Named threat-actor or family if multiple signals converge; else null
iocsobj{sha256, ipv4, domain, url, path} extracted from evidence
mitre_attacklist[str]Relevant ATT&CK technique IDs
compliance_impactlist[str]Control families implicated (e.g. NIST 800-53 SI-4)

Why IC tradecraft?

An unconstrained LLM will produce confidently-worded but uncalibrated analysis. The schema forces three separations the IC analytic standards require:

  1. Probability vs. confidence. "Likely malicious with low analytic confidence" is meaningfully different from "roughly even chance with high confidence." Both are honest answers; conflating them isn't.
  2. Source vs. inference. The rationale must distinguish what was observed (artifact content) from what is inferred (judgment). The source_reliability grade tracks the former, the analytic_confidence tracks the latter.
  3. Hypothesis competition. Listing at least two competing explanations defends against confirmation bias and surfaces the evidence that would discriminate between them (Heuer's Analysis of Competing Hypotheses).

The case-wide summary

After per-finding triage, a second prompt sees only the consolidated verdicts and produces:

overall_severity
The case's effective severity
overall_estimative_probability
How likely the host is currently compromised, on the IC ladder
overall_confidence
Low / moderate / high
tlp
Default TLP for the case
one_paragraph
Executive prose (≤1500 chars)
key_judgments
3-5 most consequential calls, each with a probability label
assumptions / alternative_explanations
As above, case-wide
top_actions
3-7 next actions
if_compromised
What to do FIRST if the worst finding is real
attribution_hint
Named adversary if signals converge
iocs_to_share
Consolidated IOC dict (respect TLP when sharing)
compliance_implications
Frameworks/controls implicated

What gets sent to the LLM

Per-finding prompt = the host fingerprint + the finding's detector / severity / title / summary / MITRE / evidence + up to 5 referenced artifacts (truncated to 6 KB each). No raw file contents unless they were already captured as part of an artifact.

The case-summary prompt sees only the host + the consolidated per-finding triage records, not the underlying artifacts.

Note. If you send digger to a remote LLM, your case artifacts go to that endpoint. Use a local model for sensitive cases.

Calibrating the run

FlagDefaultUse
--skip-belowlowSet to medium for quick triage of only the suspicious stuff.
--max NCap total findings sent. Useful when iterating on prompts/models.
--only DETECTORSTriage only findings from named detectors.
--no-case-summarySkip the case-wide pass (save tokens).
--forceContinue past LLM health-check failure.
DIGGER_LLM_TEMPERATURE0.2Lower for stricter calibration; rarely raise above 0.4.
DIGGER_LLM_MAX_TOKENS1024Raise if the model truncates long rationales.

Cost & latency

One HTTP request per non-skipped finding plus one for the case summary. A 24-finding case at temp 0.2 on a 14B Q4 model averages ~30 s end-to-end on a recent Apple Silicon Mac. Throughput is dominated by the model's prefill speed since each prompt is independent — running the LLM with larger batch size doesn't help much.