Skip to content

Commit adda1ce

Browse files
feat(config, cli): enhance config options, support env vars
Closes #285 * add support for environment variables (`GITINGEST_*`) to override `config.py` defaults * implement a precedence hierarchy: CLI/Python args → environment variables → default values * introduce new CLI options (`--max-files`, `--max-total-size`, `--max-directory-depth`). * centralise environment variable utilities in `utils/config_utils.py` with functions `_get_str_env_var` and `_get_int_env_var` * add configuration examples to `README.md` * tidy and update docstrings * update tests * add missing `--tag` CLI flag * remove `isort` in favour of `ruff.lint.isort` * remove unused constants `BASE_DIR` and `TEMPLATE_DIR` in `tests/server/test_flow_integration.py` * rename constant `templates` to `JINJA_TEMPLATES` in `src/server/server_config.py` * move `Colors` from `src/server/server_utils.py` to `src/gitingest/utils/colors.py` to break circular import chain Co-authored-by: Cheelax <thomas.belloc@gmail.com>
1 parent a99089a commit adda1ce

19 files changed

+307
-130
lines changed

.pre-commit-config.yaml

Lines changed: 0 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -58,12 +58,6 @@ repos:
5858
- id: python-use-type-annotations
5959
description: 'Enforce that python3.6+ type annotations are used instead of type comments.'
6060

61-
- repo: https://github.com/PyCQA/isort
62-
rev: 6.0.1
63-
hooks:
64-
- id: isort
65-
description: 'Sort imports alphabetically, and automatically separated into sections and by type.'
66-
6761
- repo: https://github.com/pre-commit/mirrors-eslint
6862
rev: v9.30.1
6963
hooks:

README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -144,12 +144,60 @@ By default, the digest is written to a text file (`digest.txt`) in your current
144144
- Use `--output/-o <filename>` to write to a specific file.
145145
- Use `--output/-o -` to output directly to `STDOUT` (useful for piping to other tools).
146146

147+
### 🔧 Configure processing limits
148+
149+
```bash
150+
# Set higher limits for large repositories
151+
gitingest https://github.com/torvalds/linux \
152+
--max-files 100000 \
153+
--max-total-size 2147483648 \
154+
--max-directory-depth 25
155+
156+
# Process only Python files up to 1MB each
157+
gitingest /path/to/project \
158+
--include-pattern "*.py" \
159+
--max-size 1048576 \
160+
--max-files 1000
161+
```
162+
147163
See more options and usage details with:
148164

149165
```bash
150166
gitingest --help
151167
```
152168

169+
### Configuration via Environment Variables
170+
171+
You can configure various limits and settings using environment variables. All configuration environment variables start with the `GITINGEST_` prefix:
172+
173+
#### File Processing Configuration
174+
175+
- `GITINGEST_MAX_FILE_SIZE` - Maximum size of a single file to process *(default: 10485760 bytes, 10 MB)*
176+
- `GITINGEST_MAX_FILES` - Maximum number of files to process *(default: 10000)*
177+
- `GITINGEST_MAX_TOTAL_SIZE_BYTES` - Maximum size of output file *(default: 524288000 bytes, 500 MB)*
178+
- `GITINGEST_MAX_DIRECTORY_DEPTH` - Maximum depth of directory traversal *(default: 20)*
179+
- `GITINGEST_DEFAULT_TIMEOUT` - Default operation timeout in seconds *(default: 60)*
180+
- `GITINGEST_OUTPUT_FILE_NAME` - Default output filename *(default: "digest.txt")*
181+
- `GITINGEST_TMP_BASE_PATH` - Base path for temporary files *(default: system temp directory)*
182+
183+
#### Server Configuration (for self-hosting)
184+
185+
- `GITINGEST_MAX_DISPLAY_SIZE` - Maximum size of content to display in UI *(default: 300000 bytes)*
186+
- `GITINGEST_DELETE_REPO_AFTER` - Repository cleanup timeout in seconds *(default: 3600, 1 hour)*
187+
- `GITINGEST_MAX_FILE_SIZE_KB` - Maximum file size for UI slider in kB *(default: 102400, 100 MB)*
188+
- `GITINGEST_MAX_SLIDER_POSITION` - Maximum slider position in UI *(default: 500)*
189+
190+
#### Example usage
191+
192+
```bash
193+
# Configure for large scientific repositories
194+
export GITINGEST_MAX_FILES=50000
195+
export GITINGEST_MAX_FILE_SIZE=20971520 # 20 MB
196+
export GITINGEST_MAX_TOTAL_SIZE_BYTES=1073741824 # 1 GB
197+
198+
gitingest https://github.com/some/large-repo
199+
```
200+
153201
## 🐍 Python package usage
154202

155203
```python
@@ -178,6 +226,15 @@ summary, tree, content = ingest("https://github.com/username/private-repo")
178226

179227
# Include repository submodules
180228
summary, tree, content = ingest("https://github.com/username/repo-with-submodules", include_submodules=True)
229+
230+
# Configure limits programmatically
231+
summary, tree, content = ingest(
232+
"https://github.com/username/large-repo",
233+
max_file_size=20 * 1024 * 1024, # 20 MB per file
234+
max_files=50000, # 50k files max
235+
max_total_size_bytes=1024**2, # 1 MB total
236+
max_directory_depth=30 # 30 levels deep
237+
)
181238
```
182239

183240
By default, this won't write a file but can be enabled with the `output` argument.

pyproject.toml

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -112,14 +112,6 @@ case-sensitive = true
112112
[tool.pycln]
113113
all = true
114114

115-
# TODO: Remove this once we figure out how to use ruff-isort
116-
[tool.isort]
117-
profile = "black"
118-
line_length = 119
119-
remove_redundant_aliases = true
120-
float_to_top = true # https://github.com/astral-sh/ruff/issues/6514
121-
order_by_type = true
122-
filter_files = true
123115

124116
# Test configuration
125117
[tool.pytest.ini_options]

src/gitingest/__main__.py

Lines changed: 45 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -9,16 +9,20 @@
99
import click
1010
from typing_extensions import Unpack
1111

12-
from gitingest.config import MAX_FILE_SIZE, OUTPUT_FILE_NAME
12+
from gitingest.config import MAX_DIRECTORY_DEPTH, MAX_FILES, MAX_FILE_SIZE, MAX_TOTAL_SIZE_BYTES, OUTPUT_FILE_NAME
1313
from gitingest.entrypoint import ingest_async
1414

1515

1616
class _CLIArgs(TypedDict):
1717
source: str
1818
max_size: int
19+
max_files: int
20+
max_total_size: int
21+
max_directory_depth: int
1922
exclude_pattern: tuple[str, ...]
2023
include_pattern: tuple[str, ...]
2124
branch: str | None
25+
tag: str | None
2226
include_gitignored: bool
2327
include_submodules: bool
2428
token: str | None
@@ -34,6 +38,24 @@ class _CLIArgs(TypedDict):
3438
show_default=True,
3539
help="Maximum file size to process in bytes",
3640
)
41+
@click.option(
42+
"--max-files",
43+
default=MAX_FILES,
44+
show_default=True,
45+
help="Maximum number of files to process",
46+
)
47+
@click.option(
48+
"--max-total-size",
49+
default=MAX_TOTAL_SIZE_BYTES,
50+
show_default=True,
51+
help="Maximum total size of all files in bytes",
52+
)
53+
@click.option(
54+
"--max-directory-depth",
55+
default=MAX_DIRECTORY_DEPTH,
56+
show_default=True,
57+
help="Maximum depth of directory traversal",
58+
)
3759
@click.option("--exclude-pattern", "-e", multiple=True, help="Shell-style patterns to exclude.")
3860
@click.option(
3961
"--include-pattern",
@@ -42,6 +64,7 @@ class _CLIArgs(TypedDict):
4264
help="Shell-style patterns to include.",
4365
)
4466
@click.option("--branch", "-b", default=None, help="Branch to clone and ingest")
67+
@click.option("--tag", default=None, help="Tag to clone and ingest")
4568
@click.option(
4669
"--include-gitignored",
4770
is_flag=True,
@@ -98,7 +121,7 @@ def main(**cli_kwargs: Unpack[_CLIArgs]) -> None:
98121
$ gitingest --include-pattern "*.js" --exclude-pattern "node_modules/*"
99122
100123
Private repositories:
101-
$ gitingest https://github.com/user/private-repo -t ghp_token
124+
$ gitingest https://github.com/user/private-repo --token ghp_token
102125
$ GITHUB_TOKEN=ghp_token gitingest https://github.com/user/private-repo
103126
104127
Include submodules:
@@ -112,9 +135,13 @@ async def _async_main(
112135
source: str,
113136
*,
114137
max_size: int = MAX_FILE_SIZE,
138+
max_files: int = MAX_FILES,
139+
max_total_size: int = MAX_TOTAL_SIZE_BYTES,
140+
max_directory_depth: int = MAX_DIRECTORY_DEPTH,
115141
exclude_pattern: tuple[str, ...] | None = None,
116142
include_pattern: tuple[str, ...] | None = None,
117143
branch: str | None = None,
144+
tag: str | None = None,
118145
include_gitignored: bool = False,
119146
include_submodules: bool = False,
120147
token: str | None = None,
@@ -132,21 +159,29 @@ async def _async_main(
132159
A directory path or a Git repository URL.
133160
max_size : int
134161
Maximum file size in bytes to ingest (default: 10 MB).
162+
max_files : int
163+
Maximum number of files to ingest (default: 10,000).
164+
max_total_size : int
165+
Maximum total size of output file in bytes (default: 500 MB).
166+
max_directory_depth : int
167+
Maximum depth of directory traversal (default: 20).
135168
exclude_pattern : tuple[str, ...] | None
136169
Glob patterns for pruning the file set.
137170
include_pattern : tuple[str, ...] | None
138171
Glob patterns for including files in the output.
139172
branch : str | None
140-
Git branch to ingest. If ``None``, the repository's default branch is used.
173+
Git branch to clone and ingest (default: the default branch).
174+
tag : str | None
175+
Git tag to clone and ingest. If ``None``, no tag is used.
141176
include_gitignored : bool
142-
If ``True``, also ingest files matched by ``.gitignore`` or ``.gitingestignore`` (default: ``False``).
177+
If ``True``, include files ignored by ``.gitignore`` and ``.gitingestignore`` (default: ``False``).
143178
include_submodules : bool
144179
If ``True``, recursively include all Git submodules within the repository (default: ``False``).
145180
token : str | None
146181
GitHub personal access token (PAT) for accessing private repositories.
147182
Can also be set via the ``GITHUB_TOKEN`` environment variable.
148183
output : str | None
149-
The path where the output file will be written (default: ``digest.txt`` in current directory).
184+
The path where the output file is written (default: ``digest.txt`` in current directory).
150185
Use ``"-"`` to write to ``stdout``.
151186
152187
Raises
@@ -170,9 +205,13 @@ async def _async_main(
170205
summary, _, _ = await ingest_async(
171206
source,
172207
max_file_size=max_size,
173-
include_patterns=include_patterns,
208+
max_files=max_files,
209+
max_total_size_bytes=max_total_size,
210+
max_directory_depth=max_directory_depth,
174211
exclude_patterns=exclude_patterns,
212+
include_patterns=include_patterns,
175213
branch=branch,
214+
tag=tag,
176215
include_gitignored=include_gitignored,
177216
include_submodules=include_submodules,
178217
token=token,

src/gitingest/config.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,14 @@
33
import tempfile
44
from pathlib import Path
55

6-
MAX_FILE_SIZE = 10 * 1024 * 1024 # Maximum size of a single file to process (10 MB)
7-
MAX_DIRECTORY_DEPTH = 20 # Maximum depth of directory traversal
8-
MAX_FILES = 10_000 # Maximum number of files to process
9-
MAX_TOTAL_SIZE_BYTES = 500 * 1024 * 1024 # Maximum size of output file (500 MB)
10-
DEFAULT_TIMEOUT = 60 # seconds
6+
from gitingest.utils.config_utils import _get_int_env_var, _get_str_env_var
117

12-
OUTPUT_FILE_NAME = "digest.txt"
8+
MAX_FILE_SIZE = _get_int_env_var("MAX_FILE_SIZE", 10 * 1024 * 1024) # Max file size to process in bytes (10 MB)
9+
MAX_FILES = _get_int_env_var("MAX_FILES", 10_000) # Max number of files to process
10+
MAX_TOTAL_SIZE_BYTES = _get_int_env_var("MAX_TOTAL_SIZE_BYTES", 500 * 1024 * 1024) # Max output file size (500 MB)
11+
MAX_DIRECTORY_DEPTH = _get_int_env_var("MAX_DIRECTORY_DEPTH", 20) # Max depth of directory traversal
1312

14-
TMP_BASE_PATH = Path(tempfile.gettempdir()) / "gitingest"
13+
DEFAULT_TIMEOUT = _get_int_env_var("DEFAULT_TIMEOUT", 60) # Default timeout for git operations in seconds
14+
15+
OUTPUT_FILE_NAME = _get_str_env_var("OUTPUT_FILE_NAME", "digest.txt")
16+
TMP_BASE_PATH = Path(_get_str_env_var("TMP_BASE_PATH", tempfile.gettempdir())) / "gitingest"

src/gitingest/entrypoint.py

Lines changed: 47 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,11 @@ async def ingest_async(
3333
source: str,
3434
*,
3535
max_file_size: int = MAX_FILE_SIZE,
36-
include_patterns: str | set[str] | None = None,
36+
max_files: int | None = None,
37+
max_total_size_bytes: int | None = None,
38+
max_directory_depth: int | None = None,
3739
exclude_patterns: str | set[str] | None = None,
40+
include_patterns: str | set[str] | None = None,
3841
branch: str | None = None,
3942
tag: str | None = None,
4043
include_gitignored: bool = False,
@@ -51,17 +54,23 @@ async def ingest_async(
5154
Parameters
5255
----------
5356
source : str
54-
The source to analyze, which can be a URL (for a Git repository) or a local directory path.
57+
A directory path or a Git repository URL.
5558
max_file_size : int
56-
Maximum allowed file size for file ingestion. Files larger than this size are ignored (default: 10 MB).
57-
include_patterns : str | set[str] | None
58-
Pattern or set of patterns specifying which files to include. If ``None``, all files are included.
59+
Maximum file size in bytes to ingest (default: 10 MB).
60+
max_files : int | None
61+
Maximum number of files to ingest (default: 10,000).
62+
max_total_size_bytes : int | None
63+
Maximum total size of output file in bytes (default: 500 MB).
64+
max_directory_depth : int | None
65+
Maximum depth of directory traversal (default: 20).
5966
exclude_patterns : str | set[str] | None
60-
Pattern or set of patterns specifying which files to exclude. If ``None``, no files are excluded.
67+
Glob patterns for pruning the file set.
68+
include_patterns : str | set[str] | None
69+
Glob patterns for including files in the output.
6170
branch : str | None
62-
The branch to clone and ingest (default: the default branch).
71+
Git branch to clone and ingest (default: the default branch).
6372
tag : str | None
64-
The tag to clone and ingest. If ``None``, no tag is used.
73+
Git tag to to clone and ingest. If ``None``, no tag is used.
6574
include_gitignored : bool
6675
If ``True``, include files ignored by ``.gitignore`` and ``.gitingestignore`` (default: ``False``).
6776
include_submodules : bool
@@ -70,7 +79,7 @@ async def ingest_async(
7079
GitHub personal access token (PAT) for accessing private repositories.
7180
Can also be set via the ``GITHUB_TOKEN`` environment variable.
7281
output : str | None
73-
File path where the summary and content should be written.
82+
File path where the summary and content is written.
7483
If ``"-"`` (dash), the results are written to ``stdout``.
7584
If ``None``, the results are not written to a file.
7685
@@ -107,6 +116,13 @@ async def ingest_async(
107116
if query.url:
108117
_override_branch_and_tag(query, branch=branch, tag=tag)
109118

119+
if max_files is not None:
120+
query.max_files = max_files
121+
if max_total_size_bytes is not None:
122+
query.max_total_size_bytes = max_total_size_bytes
123+
if max_directory_depth is not None:
124+
query.max_directory_depth = max_directory_depth
125+
110126
query.include_submodules = include_submodules
111127

112128
async with _clone_repo_if_remote(query, token=token):
@@ -121,8 +137,11 @@ def ingest(
121137
source: str,
122138
*,
123139
max_file_size: int = MAX_FILE_SIZE,
124-
include_patterns: str | set[str] | None = None,
140+
max_files: int | None = None,
141+
max_total_size_bytes: int | None = None,
142+
max_directory_depth: int | None = None,
125143
exclude_patterns: str | set[str] | None = None,
144+
include_patterns: str | set[str] | None = None,
126145
branch: str | None = None,
127146
tag: str | None = None,
128147
include_gitignored: bool = False,
@@ -139,17 +158,23 @@ def ingest(
139158
Parameters
140159
----------
141160
source : str
142-
The source to analyze, which can be a URL (for a Git repository) or a local directory path.
161+
A directory path or a Git repository URL.
143162
max_file_size : int
144-
Maximum allowed file size for file ingestion. Files larger than this size are ignored (default: 10 MB).
145-
include_patterns : str | set[str] | None
146-
Pattern or set of patterns specifying which files to include. If ``None``, all files are included.
163+
Maximum file size in bytes to ingest (default: 10 MB).
164+
max_files : int | None
165+
Maximum number of files to ingest (default: 10,000).
166+
max_total_size_bytes : int | None
167+
Maximum total size of output file in bytes (default: 500 MB).
168+
max_directory_depth : int | None
169+
Maximum depth of directory traversal (default: 20).
147170
exclude_patterns : str | set[str] | None
148-
Pattern or set of patterns specifying which files to exclude. If ``None``, no files are excluded.
171+
Glob patterns for pruning the file set.
172+
include_patterns : str | set[str] | None
173+
Glob patterns for including files in the output.
149174
branch : str | None
150-
The branch to clone and ingest (default: the default branch).
175+
Git branch to clone and ingest (default: the default branch).
151176
tag : str | None
152-
The tag to clone and ingest. If ``None``, no tag is used.
177+
Git tag to to clone and ingest. If ``None``, no tag is used.
153178
include_gitignored : bool
154179
If ``True``, include files ignored by ``.gitignore`` and ``.gitingestignore`` (default: ``False``).
155180
include_submodules : bool
@@ -158,7 +183,7 @@ def ingest(
158183
GitHub personal access token (PAT) for accessing private repositories.
159184
Can also be set via the ``GITHUB_TOKEN`` environment variable.
160185
output : str | None
161-
File path where the summary and content should be written.
186+
File path where the summary and content is written.
162187
If ``"-"`` (dash), the results are written to ``stdout``.
163188
If ``None``, the results are not written to a file.
164189
@@ -179,8 +204,11 @@ def ingest(
179204
ingest_async(
180205
source=source,
181206
max_file_size=max_file_size,
182-
include_patterns=include_patterns,
207+
max_files=max_files,
208+
max_total_size_bytes=max_total_size_bytes,
209+
max_directory_depth=max_directory_depth,
183210
exclude_patterns=exclude_patterns,
211+
include_patterns=include_patterns,
184212
branch=branch,
185213
tag=tag,
186214
include_gitignored=include_gitignored,

0 commit comments

Comments
 (0)