Skip to content

Conversation

@Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Oct 31, 2025

Disclaimer:

Code changes and PR description auto-generated by Cursor Composer (2.0 Agent mode). Initial prompt I gave:

in @cache.py we define `hf cache ls`, a CLI which --help is:

@zsh (17-69) 

I want to add two parameters to it:

--sort which can take as input "accessed" (sort by last accessed first), "modified" (sort by last modified first), name (sort by ID alphabetically), size (sort by size - biggest first). Each key can be appended with :asc or :desc to explicitly set the order. Example "modified:asc" == "modified the longest time ago first".


--limit => takes an integer as input and returns only the "top N results for that search"


Can you implement these 2 parameters and document them in @cli.md + @manage-cache.md.

Thanks!
  • then reviewed and asked for some tweaks myself.

What a pity, I had to run make style by myself 😄 (also, I've tested the CLI myself on my machine)


Summary

This PR adds two new parameters to the hf cache ls command to enable sorting and limiting cache entries, making it easier to find and manage cached repositories and revisions.

Changes

New Features

  1. --sort parameter: Sort cache entries by various keys

    • Supported keys: accessed, modified, name, size
    • Optional order suffix: :asc or :desc (e.g., modified:asc)
    • Default ordering:
      • accessed, modified, size → descending (newest/biggest first)
      • name → ascending (alphabetical)
  2. --limit parameter: Limit the number of results returned

    • Takes a positive integer
    • Applied after sorting to return top N entries

Implementation Details

  • Added compile_cache_sort() function to parse and validate sort expressions
  • Added _SORT_KEYS and _SORT_PATTERN constants for validation
  • Sort and limit are applied after filtering but before output formatting
  • Proper error handling for invalid sort expressions and negative limits

Examples

# List repos sorted by size (biggest first)
hf cache ls --sort size

# List top 10 most recently accessed repos
hf cache ls --sort accessed --limit 10

# List oldest revisions first
hf cache ls --revisions --sort modified:asc

# List repos alphabetically
hf cache ls --sort name:asc

# Combine with filters to find largest old repos
hf cache ls --filter "accessed>1y" --sort size --limit 5

Testing

  • Added comprehensive test coverage for:
    • Basic sorting functionality
    • Sort with explicit order (:asc/:desc)
    • Limit functionality
    • Error handling for invalid sort keys
    • Error handling for negative limits

Documentation

  • Updated docs/source/en/guides/cli.md with brief mention of new options
  • Updated docs/source/en/guides/manage-cache.md with usage examples

Benefits

  • Improved usability: Users can now easily find the largest or most recently accessed cached items
  • Better cache management: Limit results to focus on top N entries after sorting
  • Flexible queries: Combine with existing filters for powerful cache inspection workflows

Backward Compatibility

✅ Fully backward compatible - all new parameters are optional

@Wauplin Wauplin requested a review from hanouticelina October 31, 2025 08:42
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

```

The command supports several output formats for scripting: `--format json` prints structured objects, `--format csv` writes comma-separated rows, and `--quiet` prints only IDs. Combine these with `--cache-dir` to target alternative cache locations. See the [Manage your cache](./manage-cache) guide for advanced workflows.
The command supports several output formats for scripting: `--format json` prints structured objects, `--format csv` writes comma-separated rows, and `--quiet` prints only IDs. Use `--sort` to order entries by `accessed`, `modified`, `name`, or `size` (append `:asc` or `:desc` to control order), and `--limit` to restrict results to the top N entries. Combine these with `--cache-dir` to target alternative cache locations. See the [Manage your cache](./manage-cache) guide for advanced workflows.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

personal preference to have --order asc|desc instead of having a suffix (more intuitive imo). wdyt?

hf cache ls --sort accessed --order desc 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to find it quite verbose to be honest.

At first I thought we should go for --sort=accessed / --sort=-accessed like some other CLIs but I find it less explicit, hence the solution I suggested in this PR. If you feel strongly about it I can change

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a strong opinion here, I was mainly suggesting this for better discoverability (and readability). Using --sort accessed --order desc reads more like plain English 😄 and fwiw, gh cache CLI uses a combination of --sort and --order options. Also I was thinking that having a separate --order flag could also help with shell completion (for ["asc", "desc"]) but it turns out we already have that covered:

Screen.Recording.2025-10-31.at.16.02.20.mov

( 👍 i definitely agree on not doing something like --sort=accessed / --sort=-accessed)

Copy link
Contributor Author

@Wauplin Wauplin Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it turns out we already have that covered:

That almost the only addition I've made manually to Cursor output (the auto-generation of the string enum) 😄

gh cache CLI uses a combination of --sort and --order options.

Do you know what's happening when --order is passed without --sort? Is it just ignored? Also if we have both arguments, someone could be tempted to do a --order size if they haven't seen the --sort parameter yet. (that was my reason to avoid having several args)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what's happening when --order is passed without --sort? Is it just ignored?

not sure and i can't test locally (i don't have any gh cache 😄) but i think gh will default the sort field to the default value (last_accessed_at) https://cli.github.com/manual/gh_cache_list

Also if we have both arguments, someone could be tempted to do a --order size if they haven't seen the --sort parameter yet.

yes, makes sense!

Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very nice! could you ask cursor to generate a small test for at least one sorting key? 😄

@Wauplin
Copy link
Contributor Author

Wauplin commented Nov 3, 2025

very nice! could you ask cursor to generate a small test for at least one sorting key? 😄

As a matter of fact it generated plenty of tests that I removed (waaaayyyy too specific) but I was sure I kept one. Anyway, I added one back manually

@Wauplin Wauplin requested a review from hanouticelina November 4, 2025 13:01
Copy link
Contributor

@hanouticelina hanouticelina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@hanouticelina hanouticelina merged commit e2f6f26 into main Nov 4, 2025
23 checks passed
@hanouticelina hanouticelina deleted the add-sort-and-limit-parameters-in-cache-ls branch November 4, 2025 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants