Live threat intel 15 continuously-polled feeds, per-feed cadences, conditional fetches + composite multi-URL fetchers. Live-first convention is statically enforced.
What's wired up
| Feed | Source | Refresh | What it gives detectors |
|---|---|---|---|
cisa_kev | cisa.gov | 24 h | Known Exploited Vulnerabilities catalog. Used by supply_chain to cross-check installed-software inventory. |
urlhaus_recent | abuse.ch | 15 min | Recently observed malware URLs. Used by c2 + browser cross-reference. |
malwarebazaar_recent | abuse.ch | 15 min | Recent malware-sample hashes (SHA-256, MD5). Used by c2 to flag running exes by hash. |
threatfox_recent | abuse.ch | 15 min | Fresh IOCs (IPs, domains, URLs, hashes). Used by c2 + browser cross-reference. |
tor_exit_list | torproject.org | 1 h | Tor bulk exit-node list. |
spamhaus_drop | spamhaus.org | 12 h | Hijacked IP space (DROP list). |
spamhaus_edrop | spamhaus.org | 12 h | Extended DROP (sub-allocations). |
emerging_threats_compromised | emergingthreats.net | 6 h | Compromised-IPs blocklist. |
openssf_malicious_packages | OpenSSF | 12 h | OSV-formatted malicious-package dataset (npm, PyPI, …). Authoritative for supply_chain; bundled file is fallback only. |
shai_hulud_packages | Aikido (community-maintained) | 1 h | Shai-Hulud worm IOCs: compromised packages + worm marker tiers + exfil URL patterns + worm workflow filename. Authoritative per-tier; bundled file is fallback per tier. |
github_advisory_npm | api.github.com | 3 h | GitHub Advisory DB, npm ecosystem. |
github_advisory_pip | api.github.com | 3 h | GitHub Advisory DB, PyPI ecosystem. |
nvd_service_cves composite | NVD API 2.0 (~30 CPEs) | 24 h | CPE-keyed CVE corpus paginated across the curated service-product list. Used by service_cve. Honors $NVD_API_KEY for the 50 req/30s rate tier. |
sigmahq_corpus composite | SigmaHQ master tarball | 24 h | Community detection rules filtered to 8 attack categories. SigmaLoader auto-extends its search path with the live cache; existing Sigma detector picks them up without changes. |
mitre_attack_groups composite | MITRE ATT&CK Enterprise STIX 2.1 | 7 d | Threat-actor groups + associated software + techniques normalized into the actors-list shape threat_actor consumes. Authoritative for threat_actor; bundled file is supplemental. |
Live-first convention
Detectors that load bundled rule data MUST also call
load_intel(...) for the live equivalent first; the live
feed is authoritative when present, bundled is fallback only. This is
statically enforced by tests/test_data_freshness.py — an AST
guardrail that walks every detector file and asserts the call order.
The escape hatch for digger-native data with no upstream counterpart (e.g., the unpatched-Chromium-bug corpus) is a per-file comment:
# live-first-ok: <reason explaining why no live feed exists>
Composite feeds (multi-URL fetchers)
Feed.fetch_fn overrides the single-URL GET path for sources
that need pagination, GitHub-API tree walks, or tarball extraction.
Currently used by NVD (paginates per CPE), SigmaHQ (downloads master
tarball + extracts subset), and MITRE ATT&CK (downloads + parses
STIX 2.1 bundle).
Implementations live under digger/intel/sources/ and each
calls digger.opsec.airgap.assert_network_allowed first so
air-gap mode refuses cleanly.
Source: digger/intel/feeds.py. Each feed declares a URL, cadence,
and parser function. Adding a feed = adding an entry to the FEEDS
list (see Extending).
How fetches work
Every feed has a .meta.json sidecar in the cache directory
storing the last ETag and Last-Modified header.
Subsequent fetches send If-None-Match / If-Modified-Since
and the server returns 304 when unchanged — cheap and polite.
The cache directory defaults to ~/.cache/digger/intel/ but
can be overridden with the DIGGER_INTEL_DIR environment variable.
$ ls ~/.cache/digger/intel/
cisa_kev.json cisa_kev.meta.json cisa_kev.raw
urlhaus_recent.json urlhaus_recent.meta.json urlhaus_recent.raw
threatfox_recent.json threatfox_recent.meta.json threatfox_recent.raw
...
- .json
- Parsed, normalized form that detectors load.
- .raw
- Untouched response body, for re-parsing or forensic record.
- .meta.json
- ETag, Last-Modified, fetched-at timestamp, size.
Three ways to keep the cache fresh
On-demand
digger intel update # refresh everything whose interval has elapsed
digger intel update --force # bypass intervals
digger intel update --only cisa_kev,threatfox_recent
Daemon mode (foreground)
digger intel watch
Runs the IntelScheduler in the foreground — one thread polling
each feed on its cadence. Ctrl-C exits cleanly.
Cron / systemd-timer
For longer-running deployments, schedule digger intel update
externally:
# crontab
*/30 * * * * /usr/local/bin/digger intel update --no-banner
# systemd timer
[Timer]
OnBootSec=2m
OnUnitActiveSec=30m
Status
$ digger intel status
[fresh] cisa_kev fetched 53s ago size=1473939
[fresh] threatfox_recent fetched 48s ago size=8200
[STALE] malwarebazaar_recent fetched 3601s ago size=92114
[STALE] github_advisory_pip fetched never size=?
A feed is stale when its age exceeds the configured interval, or
when it has never been fetched. Detectors transparently fall back to
bundled YAML snapshots under digger/rules/ when a feed's cache
is empty.
API keys
Most feeds don't require auth. The GitHub Advisory endpoints accept an
optional GITHUB_TOKEN environment variable to lift rate limits.
export GITHUB_TOKEN=ghp_…
digger intel update --only github_advisory_npm,github_advisory_pip
Detectors that consume live intel
| Detector | Feeds |
|---|---|
supply_chain | openssf_malicious_packages, cisa_kev |
shai_hulud | shai_hulud_packages |
c2 | threatfox_recent, urlhaus_recent, malwarebazaar_recent |
Loading happens lazily — each detector calls
load_intel(feed_name) at detect() time, so updates
between scans are picked up automatically without restarting digger.