Skip to content

Conversation

jordandekraker
Copy link

This is a pretty simple PR, it just removes the embedder so people can feed in other (flattened) data types, such as embedded video frames or audio or other. We also remove softmax and logits. From the sampler, we remove min_p_filter and gumbel_sample.

For tokens, it is recommended to do embedding and logits outside mac_transformer.py. I wasn't able to run train_mac.py due to incompatible dependencies, so I left it alone, but it should be simple to add an embedder and logits. min_p_filter and gumbel_sample could possibly be added back in somewhere else (utilis?)

@jordandekraker
Copy link
Author

Again, I could not test this but this dae41d2 is how I would organize train_mac.py. Conisder it only a suggestion!

Notes: we're cutting a dense, no bias logits layer off the end and swapping cross_entropy for l1_loss. This is a price I would pay for being able to run on non token data.

@lucidrains
Copy link
Owner

lucidrains commented Apr 12, 2025

@jordandekraker hey Jordan, thanks for the pull request

any chance you could disentangle this so it can support language modeling and your use-case (with a few tests)? are you seeing something with this architecture on continuous data?

@lucidrains
Copy link
Owner

@jordandekraker it may be faster if i just build it for you, but you'll have to share with me what you are seeing. just reach out over Signal

@jordandekraker
Copy link
Author

The README code worked for me and sample was able to return an appropriately sized tensor of floats. I haven't tested whether this returns sensible results in, e.g., embedded video frames yet.

The changes to get away from tokens MAY be a bit deeper than I thought - not sure if additional classes and modules will need to be updated to have a features dimension. That is, even though I cannot run it all locally, i think tests/test_titans.py may still fail on test_flex with SegmentedAttention. I didn't touch the dimensionality of that code, but i'm just not sure if its implicitly being handled correctly. There are some reshapes making me shakey.
In general I think its nice to explicitly have a features dimension instead of tokens, but I would not be upset if you disagree.

I prefer to chat via github but can move if that is prohibitive for you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants