This is a complicated change. At the moment, we are holding all feature activations in RAM while caching, which becomes problematic when dealing with millions of tokens.
I thing the way we want to do this is to use something like huggingface datasets.