Skip to content

Conversation

ArthurZucker
Copy link
Collaborator

The motivation behind this is:

  • that as of today, without this info you can't do a génération loop because you need to obtain the info from somewhere else.
  • ideally the rust crate should be able to provide everything a tokenizer "has" and a mapping from standard names is kinda missing.
  • you have to download and read the tokenizer_config.json generated by transformers when it should be fairly easy to keep track of that info!
  • transformers v5 refactors its tokenization backend to be minimal, and this is aligned with removing the tokenizer_config.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant