scanc = scan c(ode)
A fast, pure‑Python project code‑scanner that outputs clean, AI‑ready Markdown or XML.
scanc
helps you spill an entire codebase into an LLM prompt (or a file) in seconds—while keeping noise low, controlling token budgets, and giving you full visibility.
Feature | Description |
---|---|
Blazing Fast, Pure‑Python | Zero native dependencies; easy to install and run anywhere. |
Smart Default Ignores | Automatically skips node_modules , .venv , .git , and more. |
Flexible Filters | Include/exclude by extension, filename, or regex patterns. |
Optional Directory Tree | Prepend a fenced tree diagram of your project structure. |
Token Counter | Estimate LLM token costs with tiktoken before you paste. |
Cross‑Platform CLI | Works on macOS, Linux, and Windows out of the box. |
# Optional: Use a virutal environment
python3 -m venv --prompt scanc-env .venv
source .venv/bin/activate
pip install scanc[tiktoken] # installs optional token‑counter support
Scan a directory and emit Markdown:
scanc . # scan current folder
scanc -e py,js --tree # only .py and .js files + directory tree
scanc -f xml # output scan in xml format (new in v1.2.0)
scanc -e py -x "tests" | less # only py files exclude tests in path
scanc --tokens gpt-4o # show token count for gpt 4o only
scanc -e py | pbcopy # scan and copy (macOS copy command example)
Write output directly to a file:
scanc -e ts --tree -o scan.md src/
cat scan.md
scanc [OPTIONS] [PATHS...]
-e, --ext EXTS
Comma‑separated extensions to include (e.g.py,js
).-i, --include-regex
Regex patterns to include (full path match).-x, --exclude-regex
Regex patterns to exclude (full path match).--no-default-excludes
Disable built‑in ignore list.-t, --tree
Prepend directory tree (fenced code block).-T, --tokens MODEL
Output only token count for given LLM model.--max-size BYTES
Skip files larger than BYTES (default 1 MiB).--follow-symlinks
Traverse symlinks when scanning.-o, --out OUTFILE
Write result toOUTFILE
instead of stdout.-f, --format FORMAT
Output format (default:markdown
).-V, --version
Show version and exit.
- Formatter Hook: Customize output by passing your own formatter via entry points.
- Extras: Use
scanc[tiktoken]
to enable token counting; more extras may follow.
A ready-to-run container is published to GitHub Container Registry (GHCR). It runs as non-root and scans the mounted host directory by default.
docker pull ghcr.io/mqxym/scanc-cli:latest
# Linux/macOS (Bash/Zsh)
docker run --rm -v "$PWD":/work:ro ghcr.io/mqxym/scanc-cli:latest
# Windows PowerShell
docker run --rm -v "${PWD}:/work:ro" ghcr.io/mqxym/scanc-cli:latest
Because the container’s WORKDIR
is /work
and ENTRYPOINT
is scanc
,
passing .
scans your host’s current folder.
Either redirect on the host:
docker run --rm -v "$PWD":/work:ro ghcr.io/mqxym/scanc-cli:latest -e py --tree > scan.md
...or mount as writable and write into /work
:
docker run --rm -v "$PWD":/work ghcr.io/mqxym/scanc-cli:latest -e py --tree -o /work/scan.md
Tip (Linux/macOS): preserve file ownership when writing by mapping your UID/GID
docker run --rm \ --user "$(id -u)":"$(id -g)" \ -v "$PWD":/work ghcr.io/mqxym/scanc-cli:latest -o /work/scan.md
# Only Python & JS files, include directory tree
docker run --rm -v "$PWD":/work:ro ghcr.io/mqxym/scanc-cli:latest -e py,js --tree
# Token count only (requires optional 'tiktoken' which is baked into the image)
docker run --rm -v "$PWD":/work:ro ghcr.io/mqxym/scanc-cli:latest --tokens gpt-4o
Released under the MIT Licence. See LICENCE for details.