Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/en/guides/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -614,7 +614,7 @@ model/microsoft/UserLM-8b be8f2069189bdf443e554c24e488ff3ff6952691 32.1G 4 da
Found 1 repo(s) for a total of 1 revision(s) and 32.1G on disk.
```

The command supports several output formats for scripting: `--format json` prints structured objects, `--format csv` writes comma-separated rows, and `--quiet` prints only IDs. Combine these with `--cache-dir` to target alternative cache locations. See the [Manage your cache](./manage-cache) guide for advanced workflows.
The command supports several output formats for scripting: `--format json` prints structured objects, `--format csv` writes comma-separated rows, and `--quiet` prints only IDs. Use `--sort` to order entries by `accessed`, `modified`, `name`, or `size` (append `:asc` or `:desc` to control order), and `--limit` to restrict results to the top N entries. Combine these with `--cache-dir` to target alternative cache locations. See the [Manage your cache](./manage-cache) guide for advanced workflows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personal preference to have --order asc|desc instead of having a suffix (more intuitive imo). wdyt?

hf cache ls --sort accessed --order desc 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to find it quite verbose to be honest.

At first I thought we should go for --sort=accessed / --sort=-accessed like some other CLIs but I find it less explicit, hence the solution I suggested in this PR. If you feel strongly about it I can change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strong opinion here, I was mainly suggesting this for better discoverability (and readability). Using --sort accessed --order desc reads more like plain English 😄 and fwiw, gh cache CLI uses a combination of --sort and --order options. Also I was thinking that having a separate --order flag could also help with shell completion (for ["asc", "desc"]) but it turns out we already have that covered:

Screen.Recording.2025-10-31.at.16.02.20.mov

( 👍 i definitely agree on not doing something like --sort=accessed / --sort=-accessed)

Copy link
Contributor Author

@Wauplin Wauplin Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it turns out we already have that covered:

That almost the only addition I've made manually to Cursor output (the auto-generation of the string enum) 😄

gh cache CLI uses a combination of --sort and --order options.

Do you know what's happening when --order is passed without --sort? Is it just ignored? Also if we have both arguments, someone could be tempted to do a --order size if they haven't seen the --sort parameter yet. (that was my reason to avoid having several args)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what's happening when --order is passed without --sort? Is it just ignored?

not sure and i can't test locally (i don't have any gh cache 😄) but i think gh will default the sort field to the default value (last_accessed_at) https://cli.github.com/manual/gh_cache_list

Also if we have both arguments, someone could be tempted to do a --order size if they haven't seen the --sort parameter yet.

yes, makes sense!


Delete cache entries selected with `hf cache ls --q` by piping the IDs into `hf cache rm`:

Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/guides/manage-cache.md
Original file line number Diff line number Diff line change
Expand Up @@ -401,7 +401,7 @@ Found 2 repo(s) for a total of 2 revision(s) and 3.0G on disk.

Need machine-friendly output? Use `--format json` to get structured objects or
`--format csv` for spreadsheets. Alternatively `--quiet` prints only identifiers (one
per line) so you can pipe them into other tooling. Combine these options with
per line) so you can pipe them into other tooling. Use `--sort` to order entries by `accessed`, `modified`, `name`, or `size` (append `:asc` or `:desc` to control order), and `--limit` to restrict results to the top N entries. Combine these options with
`--cache-dir` when you need to inspect a cache stored outside of `HF_HOME`.

**Filter with common shell tools**
Expand Down
2 changes: 2 additions & 0 deletions docs/source/en/package_reference/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,8 @@ $ hf cache ls [OPTIONS]
* `-f, --filter TEXT`: Filter entries (e.g. 'size>1GB', 'type=model', 'accessed>7d'). Can be used multiple times.
* `--format [table|json|csv]`: Output format. [default: table]
* `-q, --quiet`: Print only IDs (repo IDs or revision hashes).
* `--sort [accessed|accessed:asc|accessed:desc|modified|modified:asc|modified:desc|name|name:asc|name:desc|size|size:asc|size:desc]`: Sort entries by key. Supported keys: 'accessed', 'modified', 'name', 'size'. Append ':asc' or ':desc' to explicitly set the order (e.g., 'modified:asc'). Defaults: 'accessed', 'modified', 'size' default to 'desc' (newest/biggest first); 'name' defaults to 'asc' (alphabetical).
* `--limit INTEGER`: Limit the number of results returned. Returns only the top N entries after sorting.
* `--help`: Show this message and exit.

### `hf cache prune`
Expand Down
107 changes: 106 additions & 1 deletion src/huggingface_hub/cli/cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,18 @@ class _DeletionResolution:
_FILTER_PATTERN = re.compile(r"^(?P<key>[a-zA-Z_]+)\s*(?P<op>==|!=|>=|<=|>|<|=)\s*(?P<value>.+)$")
_ALLOWED_OPERATORS = {"=", "!=", ">", "<", ">=", "<="}
_FILTER_KEYS = {"accessed", "modified", "refs", "size", "type"}
_SORT_KEYS = {"accessed", "modified", "name", "size"}
_SORT_PATTERN = re.compile(r"^(?P<key>[a-zA-Z_]+)(?::(?P<order>asc|desc))?$")


# Dynamically generate SortOptions enum from _SORT_KEYS
_sort_options_dict = {}
for key in sorted(_SORT_KEYS):
_sort_options_dict[key] = key
_sort_options_dict[f"{key}_asc"] = f"{key}:asc"
_sort_options_dict[f"{key}_desc"] = f"{key}:desc"

SortOptions = Enum("SortOptions", _sort_options_dict, type=str, module=__name__) # type: ignore


@dataclass(frozen=True)
Expand Down Expand Up @@ -378,6 +390,70 @@ def _compare_numeric(left: Optional[float], op: str, right: float) -> bool:
return comparisons[op]


def compile_cache_sort(
sort_expr: str, *, include_revisions: bool
) -> tuple[Callable[[CacheEntry], tuple[Any, ...]], bool]:
"""Convert a `hf cache ls` sort expression into a key function for sorting entries.

Returns:
A tuple of (key_function, reverse_flag) where reverse_flag indicates whether
to sort in descending order (True) or ascending order (False).
"""
match = _SORT_PATTERN.match(sort_expr.strip().lower())
if not match:
raise ValueError(f"Invalid sort expression: '{sort_expr}'. Expected format: 'key' or 'key:asc' or 'key:desc'.")

key = match.group("key").lower()
explicit_order = match.group("order")

if key not in _SORT_KEYS:
raise ValueError(f"Unsupported sort key '{key}' in '{sort_expr}'. Must be one of {list(_SORT_KEYS)}.")

# Default ordering: accessed/modified/size are descending (newest/biggest first), name is ascending
default_orders = {
"accessed": "desc",
"modified": "desc",
"size": "desc",
"name": "asc",
}

# Use explicit order if provided, otherwise use default for the key
order = explicit_order if explicit_order else default_orders[key]
reverse = order == "desc"

def _sort_key(entry: CacheEntry) -> tuple[Any, ...]:
repo, revision = entry

if key == "name":
# Sort by cache_id (repo type/id)
value: Any = repo.cache_id.lower()
return (value,)

if key == "size":
# Use revision size if available, otherwise repo size
value = revision.size_on_disk if revision is not None else repo.size_on_disk
return (value,)

if key == "accessed":
# For revisions, accessed is not available per-revision, use repo's last_accessed
# For repos, use repo's last_accessed
value = repo.last_accessed if repo.last_accessed is not None else 0.0
return (value,)

if key == "modified":
# Use revision's last_modified if available, otherwise repo's last_modified
if revision is not None:
value = revision.last_modified if revision.last_modified is not None else 0.0
else:
value = repo.last_modified if repo.last_modified is not None else 0.0
return (value,)

# Should never reach here due to validation above
raise ValueError(f"Unsupported sort key: {key}")

return _sort_key, reverse


def _resolve_deletion_targets(hf_cache_info: HFCacheInfo, targets: list[str]) -> _DeletionResolution:
"""Resolve the deletion targets into a deletion resolution."""
repo_lookup, revision_lookup = build_cache_index(hf_cache_info)
Expand Down Expand Up @@ -458,13 +534,28 @@ def ls(
help="Print only IDs (repo IDs or revision hashes).",
),
] = False,
sort: Annotated[
Optional[SortOptions],
typer.Option(
help="Sort entries by key. Supported keys: 'accessed', 'modified', 'name', 'size'. "
"Append ':asc' or ':desc' to explicitly set the order (e.g., 'modified:asc'). "
"Defaults: 'accessed', 'modified', 'size' default to 'desc' (newest/biggest first); "
"'name' defaults to 'asc' (alphabetical).",
),
] = None,
limit: Annotated[
Optional[int],
typer.Option(
help="Limit the number of results returned. Returns only the top N entries after sorting.",
),
] = None,
) -> None:
"""List cached repositories or revisions."""
try:
hf_cache_info = scan_cache_dir(cache_dir)
except CacheNotFound as exc:
print(f"Cache directory not found: {str(exc.cache_dir)}")
raise typer.Exit(code=1)
raise typer.Exit(code=1) from exc

filters = filter or []

Expand All @@ -478,6 +569,20 @@ def ls(
for fn in filter_fns:
entries = [entry for entry in entries if fn(entry[0], entry[1], now)]

# Apply sorting if requested
if sort:
try:
sort_key_fn, reverse = compile_cache_sort(sort.value, include_revisions=revisions)
entries.sort(key=sort_key_fn, reverse=reverse)
except ValueError as exc:
raise typer.BadParameter(str(exc)) from exc

# Apply limit if requested
if limit is not None:
if limit < 0:
raise typer.BadParameter(f"Limit must be a positive integer, got {limit}.")
entries = entries[:limit]

if quiet:
for repo, revision in entries:
print(revision.commit_hash if revision is not None else repo.cache_id)
Expand Down
Loading