Case diff What changed on this host since the last collection — surfaced as added, removed, modified artifacts and new/resolved findings.

Why this exists

The most common question in repeat-collection forensics is "what's new here since the baseline?" A point-in-time report doesn't answer it — you'd have to manually cross-reference two cases. digger answers it directly: collect twice, diff the two case directories, and get a structured report.

Source: digger/diff/comparator.py.

Run it

# Collect on Monday
digger collect --case-dir /var/lib/digger/cases/2026-06-08

# Collect on Tuesday
digger collect --case-dir /var/lib/digger/cases/2026-06-09

# Diff
digger diff --base /var/lib/digger/cases/2026-06-08 \
            --new  /var/lib/digger/cases/2026-06-09 \
            --out  /var/lib/digger/diffs/2026-06-09.html

Format options: --format html (default), md, json.

What you get back

CategoryDescription
Artifacts added Present in the new case, not in the base. New process running. New LaunchAgent plist. New SSH key in authorized_keys. New npm dependency in a lockfile. This is the high-signal category.
Artifacts removed Present in the base, gone in the new. A service stopped, a launchd plist deleted, a package uninstalled, an ssh key revoked.
Artifacts modified Same identity, different content. A persistence plist's ProgramArguments changed. A sudoers file edited. A package upgraded to a new version.
Findings new A detector fired in the new case that hadn't fired in the base. Worth attention.
Findings resolved A detector that had been firing on the base is silent on the new. Usually means the underlying state was fixed (or hidden).
Findings modified The same finding fires in both but with different evidence — e.g. severity escalated.
Findings persisted Unchanged between runs. Useful as a sanity check that ongoing issues are still being tracked.

Identity matching

Two artifacts from different runs are "the same artifact" iff their collector-specific identity tuple matches. Volatile fields like pid, ppid, create_time, ephemeral laddr port, and uptime are excluded so that a re-spawned Chrome doesn't show up as a new process.

CollectorIdentity fields
processes(name, exe, cmdline, username)
network(raddr, status, type)
users(user, uid, gid)
ssh_keys(path, name)
macos.launchd(path,)
windows.registry_persistence(hive, subkey)
npm_packages(project,)
github_workflows(path,)
see digger/diff/comparator.py:IDENTITY_FIELDS

Per-collector diff modes

Some collectors emit high-churn output that drowns the diff. Each collector has a mode:

track
Full per-row diff. Default.
summarize
Just shows the artifact-count delta. Used for recent_files, browsers, system, macos.quarantine, linux.auth_logs, linux.audit.
ignore
Skipped entirely. Used for windows.event_logs and macos.unified_logs — these are always different by nature.

Hunting workflow

The intended weekly hunting loop:

#!/usr/bin/env bash
# /usr/local/bin/digger-hunt.sh
set -euo pipefail

HOST=$(hostname -s)
ROOT=/var/lib/digger
DATE=$(date -u +%Y-%m-%d)

LATEST="$ROOT/cases/$HOST/$DATE"
PREV=$(ls -1d "$ROOT/cases/$HOST"/* 2>/dev/null | tail -1)

digger --no-banner collect --case-dir "$LATEST"
digger --no-banner scan    --case-dir "$LATEST"

if [[ -n "${PREV:-}" && "$PREV" != "$LATEST" ]]; then
    digger --no-banner diff --base "$PREV" --new "$LATEST" \
           --out "$ROOT/diffs/$HOST-$DATE.html"
    # Anything to report? Pipe critical findings to alerting.
    digger --no-banner diff --base "$PREV" --new "$LATEST" --format json \
        | jq '.findings.new[] | select(.severity == "critical" or .severity == "high")'
fi

Same-host check

If the two case directories have different host.node or host.machine, the diff still runs but the report carries a warning at the top. Diffing across hosts is rarely meaningful — process names match, but the underlying baselines don't.

What the diff cannot tell you