Live threat intel 15 continuously-polled feeds, per-feed cadences, conditional fetches + composite multi-URL fetchers. Live-first convention is statically enforced.

What's wired up

Feed	Source	Refresh	What it gives detectors
`cisa_kev`	cisa.gov	24 h	Known Exploited Vulnerabilities catalog. Used by `supply_chain` to cross-check installed-software inventory.
`urlhaus_recent`	abuse.ch	15 min	Recently observed malware URLs. Used by `c2` + `browser` cross-reference.
`malwarebazaar_recent`	abuse.ch	15 min	Recent malware-sample hashes (SHA-256, MD5). Used by `c2` to flag running exes by hash.
`threatfox_recent`	abuse.ch	15 min	Fresh IOCs (IPs, domains, URLs, hashes). Used by `c2` + `browser` cross-reference.
`tor_exit_list`	torproject.org	1 h	Tor bulk exit-node list.
`spamhaus_drop`	spamhaus.org	12 h	Hijacked IP space (DROP list).
`spamhaus_edrop`	spamhaus.org	12 h	Extended DROP (sub-allocations).
`emerging_threats_compromised`	emergingthreats.net	6 h	Compromised-IPs blocklist.
`openssf_malicious_packages`	OpenSSF	12 h	OSV-formatted malicious-package dataset (npm, PyPI, …). Authoritative for `supply_chain`; bundled file is fallback only.
`shai_hulud_packages`	Aikido (community-maintained)	1 h	Shai-Hulud worm IOCs: compromised packages + worm marker tiers + exfil URL patterns + worm workflow filename. Authoritative per-tier; bundled file is fallback per tier.
`github_advisory_npm`	api.github.com	3 h	GitHub Advisory DB, npm ecosystem.
`github_advisory_pip`	api.github.com	3 h	GitHub Advisory DB, PyPI ecosystem.
`nvd_service_cves` composite	NVD API 2.0 (~30 CPEs)	24 h	CPE-keyed CVE corpus paginated across the curated service-product list. Used by `service_cve`. Honors `$NVD_API_KEY` for the 50 req/30s rate tier.
`sigmahq_corpus` composite	SigmaHQ master tarball	24 h	Community detection rules filtered to 8 attack categories. `SigmaLoader` auto-extends its search path with the live cache; existing Sigma detector picks them up without changes.
`mitre_attack_groups` composite	MITRE ATT&CK Enterprise STIX 2.1	7 d	Threat-actor groups + associated software + techniques normalized into the actors-list shape `threat_actor` consumes. Authoritative for `threat_actor`; bundled file is supplemental.

Live-first convention

Detectors that load bundled rule data MUST also call load_intel(...) for the live equivalent first; the live feed is authoritative when present, bundled is fallback only. This is statically enforced by tests/test_data_freshness.py — an AST guardrail that walks every detector file and asserts the call order.

The escape hatch for digger-native data with no upstream counterpart (e.g., the unpatched-Chromium-bug corpus) is a per-file comment:

# live-first-ok: <reason explaining why no live feed exists>

Composite feeds (multi-URL fetchers)

Feed.fetch_fn overrides the single-URL GET path for sources that need pagination, GitHub-API tree walks, or tarball extraction. Currently used by NVD (paginates per CPE), SigmaHQ (downloads master tarball + extracts subset), and MITRE ATT&CK (downloads + parses STIX 2.1 bundle).

Implementations live under digger/intel/sources/ and each calls digger.opsec.airgap.assert_network_allowed first so air-gap mode refuses cleanly.

Source: digger/intel/feeds.py. Each feed declares a URL, cadence, and parser function. Adding a feed = adding an entry to the FEEDS list (see Extending).

How fetches work

Every feed has a .meta.json sidecar in the cache directory storing the last ETag and Last-Modified header. Subsequent fetches send If-None-Match / If-Modified-Since and the server returns 304 when unchanged — cheap and polite.

The cache directory defaults to ~/.cache/digger/intel/ but can be overridden with the DIGGER_INTEL_DIR environment variable.

$ ls ~/.cache/digger/intel/
cisa_kev.json          cisa_kev.meta.json          cisa_kev.raw
urlhaus_recent.json    urlhaus_recent.meta.json    urlhaus_recent.raw
threatfox_recent.json  threatfox_recent.meta.json  threatfox_recent.raw
...

.json: Parsed, normalized form that detectors load.
.raw: Untouched response body, for re-parsing or forensic record.
.meta.json: ETag, Last-Modified, fetched-at timestamp, size.

Three ways to keep the cache fresh

On-demand

digger intel update            # refresh everything whose interval has elapsed
digger intel update --force    # bypass intervals
digger intel update --only cisa_kev,threatfox_recent

Daemon mode (foreground)

digger intel watch

Runs the IntelScheduler in the foreground — one thread polling each feed on its cadence. Ctrl-C exits cleanly.

Cron / systemd-timer

For longer-running deployments, schedule digger intel update externally:

# crontab
*/30 * * * *  /usr/local/bin/digger intel update --no-banner

# systemd timer
[Timer]
OnBootSec=2m
OnUnitActiveSec=30m

Status

$ digger intel status
  [fresh]  cisa_kev                       fetched         53s ago  size=1473939
  [fresh]  threatfox_recent               fetched         48s ago  size=8200
  [STALE]  malwarebazaar_recent           fetched       3601s ago  size=92114
  [STALE]  github_advisory_pip            fetched          never   size=?

A feed is stale when its age exceeds the configured interval, or when it has never been fetched. Detectors transparently fall back to bundled YAML snapshots under digger/rules/ when a feed's cache is empty.

API keys

Most feeds don't require auth. The GitHub Advisory endpoints accept an optional GITHUB_TOKEN environment variable to lift rate limits.

export GITHUB_TOKEN=ghp_…
digger intel update --only github_advisory_npm,github_advisory_pip

Detectors that consume live intel

Detector	Feeds
`supply_chain`	openssf_malicious_packages, cisa_kev
`shai_hulud`	shai_hulud_packages
`c2`	threatfox_recent, urlhaus_recent, malwarebazaar_recent

Loading happens lazily — each detector calls load_intel(feed_name) at detect() time, so updates between scans are picked up automatically without restarting digger.