discover files¶
Scan the filesystem for AI artifacts, credentials, and configuration files.
Synopsis¶
Description¶
The discover files command walks specified directories and matches files against YAML discovery rules. Rules can match on filename globs, path patterns (including ** for recursive directory matching), and content regex patterns. It is designed to find:
- API keys (OpenAI, Anthropic, Hugging Face, Google, AWS, Pinecone, etc.)
- MCP configurations (Claude Desktop, VS Code, Cursor)
- Local LLM artifacts (GGUF models, Ollama configs, safetensors)
- Vector database data and configurations
- Fine-tuning datasets and RAG pipelines
- Jupyter AI notebooks
Flags¶
| Flag | Short | Required | Default | Description |
|---|---|---|---|---|
--path |
-p |
Yes | Path(s) to scan. Can be specified multiple times. | |
--rules-dir |
No | (none) | Additional rules directory. Loaded after built-in rules. |
See Common Flags for shared output and runtime controls.
Excluded Directories¶
The following directories are skipped by default:
.gitnode_modules__pycache__venv/.venv.cache.npm/.yarnvendor
Summary Output¶
After scanning, a summary is printed to stderr with:
- Paths scanned
- Rules loaded
- Files considered
- Excluded/error skip counts
- Findings emitted
Examples¶
# Scan a directory for AI artifacts
./aipostex discover files --path /tmp/loot
# Scan with JSON output to file
./aipostex discover files --path /home/user --format json --output findings.json
# Stealth scan with streaming JSONL output
./aipostex discover files --path /home/user --stealth --format jsonl --output findings.jsonl
# Scan multiple paths
./aipostex discover files --path /home/user --path /opt/ai-services
# Use custom rules alongside built-ins
./aipostex discover files --path /tmp/loot --rules-dir ./my-rules
Path Canonicalization¶
When multiple --path values are provided, they are canonicalized before scanning:
- Paths are cleaned (trailing slashes removed,
.resolved) - Exact duplicates are removed
- Paths that are subdirectories of another provided path are subsumed (e.g.,
/lootand/loot/subdirbecomes just/loot)
This prevents duplicate findings from overlapping scan roots.
Rule Loading¶
- Built-in rules are loaded from the embedded
embed.FS - If
--rules-diris specified, rules from that directory are loaded after embedded rules - Invalid glob patterns in rules are rejected at load time with a descriptive error
- All rules are applied to every file encountered during the walk