Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/en/guides/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -614,7 +614,7 @@ model/microsoft/UserLM-8b be8f2069189bdf443e554c24e488ff3ff6952691 32.1G 4 da
Found 1 repo(s) for a total of 1 revision(s) and 32.1G on disk.
```

The command supports several output formats for scripting: `--format json` prints structured objects, `--format csv` writes comma-separated rows, and `--quiet` prints only IDs. Combine these with `--cache-dir` to target alternative cache locations. See the [Manage your cache](./manage-cache) guide for advanced workflows.
The command supports several output formats for scripting: `--format json` prints structured objects, `--format csv` writes comma-separated rows, and `--quiet` prints only IDs. Use `--sort` to order entries by `accessed`, `modified`, `name`, or `size` (append `:asc` or `:desc` to control order), and `--limit` to restrict results to the top N entries. Combine these with `--cache-dir` to target alternative cache locations. See the [Manage your cache](./manage-cache) guide for advanced workflows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personal preference to have --order asc|desc instead of having a suffix (more intuitive imo). wdyt?

hf cache ls --sort accessed --order desc 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to find it quite verbose to be honest.

At first I thought we should go for --sort=accessed / --sort=-accessed like some other CLIs but I find it less explicit, hence the solution I suggested in this PR. If you feel strongly about it I can change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strong opinion here, I was mainly suggesting this for better discoverability (and readability). Using --sort accessed --order desc reads more like plain English 😄 and fwiw, gh cache CLI uses a combination of --sort and --order options. Also I was thinking that having a separate --order flag could also help with shell completion (for ["asc", "desc"]) but it turns out we already have that covered:

Screen.Recording.2025-10-31.at.16.02.20.mov

( 👍 i definitely agree on not doing something like --sort=accessed / --sort=-accessed)

Copy link
Contributor Author

@Wauplin Wauplin Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it turns out we already have that covered:

That almost the only addition I've made manually to Cursor output (the auto-generation of the string enum) 😄

gh cache CLI uses a combination of --sort and --order options.

Do you know what's happening when --order is passed without --sort? Is it just ignored? Also if we have both arguments, someone could be tempted to do a --order size if they haven't seen the --sort parameter yet. (that was my reason to avoid having several args)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what's happening when --order is passed without --sort? Is it just ignored?

not sure and i can't test locally (i don't have any gh cache 😄) but i think gh will default the sort field to the default value (last_accessed_at) https://cli.github.com/manual/gh_cache_list

Also if we have both arguments, someone could be tempted to do a --order size if they haven't seen the --sort parameter yet.

yes, makes sense!


Delete cache entries selected with `hf cache ls --q` by piping the IDs into `hf cache rm`:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/guides/manage-cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -401,7 +401,7 @@ Found 2 repo(s) for a total of 2 revision(s) and 3.0G on disk.

Need machine-friendly output? Use `--format json` to get structured objects or
`--format csv` for spreadsheets. Alternatively `--quiet` prints only identifiers (one
per line) so you can pipe them into other tooling. Combine these options with
per line) so you can pipe them into other tooling. Use `--sort` to order entries by `accessed`, `modified`, `name`, or `size` (append `:asc` or `:desc` to control order), and `--limit` to restrict results to the top N entries. Combine these options with
`--cache-dir` when you need to inspect a cache stored outside of `HF_HOME`.

**Filter with common shell tools**
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/package_reference/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -171,6 +171,8 @@ $ hf cache ls [OPTIONS]
* `-f, --filter TEXT`: Filter entries (e.g. 'size>1GB', 'type=model', 'accessed>7d'). Can be used multiple times.
* `--format [table|json|csv]`: Output format. [default: table]
* `-q, --quiet`: Print only IDs (repo IDs or revision hashes).
* `--sort [accessed|accessed:asc|accessed:desc|modified|modified:asc|modified:desc|name|name:asc|name:desc|size|size:asc|size:desc]`: Sort entries by key. Supported keys: 'accessed', 'modified', 'name', 'size'. Append ':asc' or ':desc' to explicitly set the order (e.g., 'modified:asc'). Defaults: 'accessed', 'modified', 'size' default to 'desc' (newest/biggest first); 'name' defaults to 'asc' (alphabetical).
* `--limit INTEGER`: Limit the number of results returned. Returns only the top N entries after sorting.
* `--help`: Show this message and exit.

### `hf cache prune`
Expand Down
104 changes: 103 additions & 1 deletion src/huggingface_hub/cli/cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,25 @@ class _DeletionResolution:
_FILTER_PATTERN = re.compile(r"^(?P<key>[a-zA-Z_]+)\s*(?P<op>==|!=|>=|<=|>|<|=)\s*(?P<value>.+)$")
_ALLOWED_OPERATORS = {"=", "!=", ">", "<", ">=", "<="}
_FILTER_KEYS = {"accessed", "modified", "refs", "size", "type"}
_SORT_KEYS = {"accessed", "modified", "name", "size"}
_SORT_PATTERN = re.compile(r"^(?P<key>[a-zA-Z_]+)(?::(?P<order>asc|desc))?$")
_SORT_DEFAULT_ORDER = {
# Default ordering: accessed/modified/size are descending (newest/biggest first), name is ascending
"accessed": "desc",
"modified": "desc",
"size": "desc",
"name": "asc",
}


# Dynamically generate SortOptions enum from _SORT_KEYS
_sort_options_dict = {}
for key in sorted(_SORT_KEYS):
_sort_options_dict[key] = key
_sort_options_dict[f"{key}_asc"] = f"{key}:asc"
_sort_options_dict[f"{key}_desc"] = f"{key}:desc"

SortOptions = Enum("SortOptions", _sort_options_dict, type=str, module=__name__) # type: ignore


@dataclass(frozen=True)
Expand Down Expand Up @@ -378,6 +397,60 @@ def _compare_numeric(left: Optional[float], op: str, right: float) -> bool:
return comparisons[op]


def compile_cache_sort(sort_expr: str) -> tuple[Callable[[CacheEntry], tuple[Any, ...]], bool]:
"""Convert a `hf cache ls` sort expression into a key function for sorting entries.

Returns:
A tuple of (key_function, reverse_flag) where reverse_flag indicates whether
to sort in descending order (True) or ascending order (False).
"""
match = _SORT_PATTERN.match(sort_expr.strip().lower())
if not match:
raise ValueError(f"Invalid sort expression: '{sort_expr}'. Expected format: 'key' or 'key:asc' or 'key:desc'.")

key = match.group("key").lower()
explicit_order = match.group("order")

if key not in _SORT_KEYS:
raise ValueError(f"Unsupported sort key '{key}' in '{sort_expr}'. Must be one of {list(_SORT_KEYS)}.")

# Use explicit order if provided, otherwise use default for the key
order = explicit_order if explicit_order else _SORT_DEFAULT_ORDER[key]
reverse = order == "desc"

def _sort_key(entry: CacheEntry) -> tuple[Any, ...]:
repo, revision = entry

if key == "name":
# Sort by cache_id (repo type/id)
value: Any = repo.cache_id.lower()
return (value,)

if key == "size":
# Use revision size if available, otherwise repo size
value = revision.size_on_disk if revision is not None else repo.size_on_disk
return (value,)

if key == "accessed":
# For revisions, accessed is not available per-revision, use repo's last_accessed
# For repos, use repo's last_accessed
value = repo.last_accessed if repo.last_accessed is not None else 0.0
return (value,)

if key == "modified":
# Use revision's last_modified if available, otherwise repo's last_modified
if revision is not None:
value = revision.last_modified if revision.last_modified is not None else 0.0
else:
value = repo.last_modified if repo.last_modified is not None else 0.0
return (value,)

# Should never reach here due to validation above
raise ValueError(f"Unsupported sort key: {key}")

return _sort_key, reverse


def _resolve_deletion_targets(hf_cache_info: HFCacheInfo, targets: list[str]) -> _DeletionResolution:
"""Resolve the deletion targets into a deletion resolution."""
repo_lookup, revision_lookup = build_cache_index(hf_cache_info)
Expand Down Expand Up @@ -458,13 +531,28 @@ def ls(
help="Print only IDs (repo IDs or revision hashes).",
),
] = False,
sort: Annotated[
Optional[SortOptions],
typer.Option(
help="Sort entries by key. Supported keys: 'accessed', 'modified', 'name', 'size'. "
"Append ':asc' or ':desc' to explicitly set the order (e.g., 'modified:asc'). "
"Defaults: 'accessed', 'modified', 'size' default to 'desc' (newest/biggest first); "
"'name' defaults to 'asc' (alphabetical).",
),
] = None,
limit: Annotated[
Optional[int],
typer.Option(
help="Limit the number of results returned. Returns only the top N entries after sorting.",
),
] = None,
) -> None:
"""List cached repositories or revisions."""
try:
hf_cache_info = scan_cache_dir(cache_dir)
except CacheNotFound as exc:
print(f"Cache directory not found: {str(exc.cache_dir)}")
raise typer.Exit(code=1)
raise typer.Exit(code=1) from exc

filters = filter or []

Expand All @@ -478,6 +566,20 @@ def ls(
for fn in filter_fns:
entries = [entry for entry in entries if fn(entry[0], entry[1], now)]

# Apply sorting if requested
if sort:
try:
sort_key_fn, reverse = compile_cache_sort(sort.value)
entries.sort(key=sort_key_fn, reverse=reverse)
except ValueError as exc:
raise typer.BadParameter(str(exc)) from exc

# Apply limit if requested
if limit is not None:
if limit < 0:
raise typer.BadParameter(f"Limit must be a positive integer, got {limit}.")
entries = entries[:limit]

if quiet:
for repo, revision in entries:
print(revision.commit_hash if revision is not None else repo.cache_id)
Expand Down
25 changes: 25 additions & 0 deletions tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,31 @@ def test_ls_quiet_revisions(self, runner: CliRunner) -> None:
assert result.exit_code == 0
assert result.stdout.strip() == revision.commit_hash

def test_ls_with_sort(self, runner: CliRunner) -> None:
repo1 = _make_repo("user/model1", revisions=[_make_revision("d" * 40)])
repo2 = _make_repo("user/model2", revisions=[_make_revision("e" * 40)])
repo3 = _make_repo("user/model3", revisions=[_make_revision("f" * 40)])
entries = [(repo1, None), (repo2, None), (repo3, None)]
repo_refs_map = {repo1: frozenset(), repo2: frozenset(), repo3: frozenset()}

with (
patch("huggingface_hub.cli.cache.scan_cache_dir"),
patch(
"huggingface_hub.cli.cache.collect_cache_entries",
return_value=(entries, repo_refs_map),
),
):
result = runner.invoke(app, ["cache", "ls", "--sort", "name:desc", "--limit", "2"])

assert result.exit_code == 0
stdout = result.stdout

# Check alphabetical order
assert stdout.index("model3") < stdout.index("model2") # descending order

# Check limit of 2 entries
assert "model1" not in stdout

def test_rm_revision_executes_strategy(self, runner: CliRunner) -> None:
revision = _make_revision("c" * 40)
repo = _make_repo("user/model", revisions=[revision])
Expand Down
Loading