Live threat intel 15 continuously-polled feeds, per-feed cadences, conditional fetches + composite multi-URL fetchers. Live-first convention is statically enforced.

What's wired up

FeedSourceRefreshWhat it gives detectors
cisa_kevcisa.gov24 hKnown Exploited Vulnerabilities catalog. Used by supply_chain to cross-check installed-software inventory.
urlhaus_recentabuse.ch15 minRecently observed malware URLs. Used by c2 + browser cross-reference.
malwarebazaar_recentabuse.ch15 minRecent malware-sample hashes (SHA-256, MD5). Used by c2 to flag running exes by hash.
threatfox_recentabuse.ch15 minFresh IOCs (IPs, domains, URLs, hashes). Used by c2 + browser cross-reference.
tor_exit_listtorproject.org1 hTor bulk exit-node list.
spamhaus_dropspamhaus.org12 hHijacked IP space (DROP list).
spamhaus_edropspamhaus.org12 hExtended DROP (sub-allocations).
emerging_threats_compromisedemergingthreats.net6 hCompromised-IPs blocklist.
openssf_malicious_packagesOpenSSF12 hOSV-formatted malicious-package dataset (npm, PyPI, …). Authoritative for supply_chain; bundled file is fallback only.
shai_hulud_packagesAikido (community-maintained)1 hShai-Hulud worm IOCs: compromised packages + worm marker tiers + exfil URL patterns + worm workflow filename. Authoritative per-tier; bundled file is fallback per tier.
github_advisory_npmapi.github.com3 hGitHub Advisory DB, npm ecosystem.
github_advisory_pipapi.github.com3 hGitHub Advisory DB, PyPI ecosystem.
nvd_service_cves compositeNVD API 2.0 (~30 CPEs)24 hCPE-keyed CVE corpus paginated across the curated service-product list. Used by service_cve. Honors $NVD_API_KEY for the 50 req/30s rate tier.
sigmahq_corpus compositeSigmaHQ master tarball24 hCommunity detection rules filtered to 8 attack categories. SigmaLoader auto-extends its search path with the live cache; existing Sigma detector picks them up without changes.
mitre_attack_groups compositeMITRE ATT&CK Enterprise STIX 2.17 dThreat-actor groups + associated software + techniques normalized into the actors-list shape threat_actor consumes. Authoritative for threat_actor; bundled file is supplemental.

Live-first convention

Detectors that load bundled rule data MUST also call load_intel(...) for the live equivalent first; the live feed is authoritative when present, bundled is fallback only. This is statically enforced by tests/test_data_freshness.py — an AST guardrail that walks every detector file and asserts the call order.

The escape hatch for digger-native data with no upstream counterpart (e.g., the unpatched-Chromium-bug corpus) is a per-file comment:

# live-first-ok: <reason explaining why no live feed exists>

Composite feeds (multi-URL fetchers)

Feed.fetch_fn overrides the single-URL GET path for sources that need pagination, GitHub-API tree walks, or tarball extraction. Currently used by NVD (paginates per CPE), SigmaHQ (downloads master tarball + extracts subset), and MITRE ATT&CK (downloads + parses STIX 2.1 bundle).

Implementations live under digger/intel/sources/ and each calls digger.opsec.airgap.assert_network_allowed first so air-gap mode refuses cleanly.

Source: digger/intel/feeds.py. Each feed declares a URL, cadence, and parser function. Adding a feed = adding an entry to the FEEDS list (see Extending).

How fetches work

Every feed has a .meta.json sidecar in the cache directory storing the last ETag and Last-Modified header. Subsequent fetches send If-None-Match / If-Modified-Since and the server returns 304 when unchanged — cheap and polite.

The cache directory defaults to ~/.cache/digger/intel/ but can be overridden with the DIGGER_INTEL_DIR environment variable.

$ ls ~/.cache/digger/intel/
cisa_kev.json          cisa_kev.meta.json          cisa_kev.raw
urlhaus_recent.json    urlhaus_recent.meta.json    urlhaus_recent.raw
threatfox_recent.json  threatfox_recent.meta.json  threatfox_recent.raw
...
.json
Parsed, normalized form that detectors load.
.raw
Untouched response body, for re-parsing or forensic record.
.meta.json
ETag, Last-Modified, fetched-at timestamp, size.

Three ways to keep the cache fresh

On-demand

digger intel update            # refresh everything whose interval has elapsed
digger intel update --force    # bypass intervals
digger intel update --only cisa_kev,threatfox_recent

Daemon mode (foreground)

digger intel watch

Runs the IntelScheduler in the foreground — one thread polling each feed on its cadence. Ctrl-C exits cleanly.

Cron / systemd-timer

For longer-running deployments, schedule digger intel update externally:

# crontab
*/30 * * * *  /usr/local/bin/digger intel update --no-banner

# systemd timer
[Timer]
OnBootSec=2m
OnUnitActiveSec=30m

Status

$ digger intel status
  [fresh]  cisa_kev                       fetched         53s ago  size=1473939
  [fresh]  threatfox_recent               fetched         48s ago  size=8200
  [STALE]  malwarebazaar_recent           fetched       3601s ago  size=92114
  [STALE]  github_advisory_pip            fetched          never   size=?

A feed is stale when its age exceeds the configured interval, or when it has never been fetched. Detectors transparently fall back to bundled YAML snapshots under digger/rules/ when a feed's cache is empty.

API keys

Most feeds don't require auth. The GitHub Advisory endpoints accept an optional GITHUB_TOKEN environment variable to lift rate limits.

export GITHUB_TOKEN=ghp_…
digger intel update --only github_advisory_npm,github_advisory_pip

Detectors that consume live intel

DetectorFeeds
supply_chainopenssf_malicious_packages, cisa_kev
shai_huludshai_hulud_packages
c2threatfox_recent, urlhaus_recent, malwarebazaar_recent

Loading happens lazily — each detector calls load_intel(feed_name) at detect() time, so updates between scans are picked up automatically without restarting digger.