Skip to content

Conversation

varisd
Copy link
Member

@varisd varisd commented Jan 9, 2019

We can currently use max_len parameter in decoders to avoid OOM exceptions during inference time.
However, this is not enough, e.g. when specifying batch_size in number of tokens.

For simplicity, imagine a decoder-only scenario. For example, let the token-level batch_size be 9, which barely fits into memory, and max_len=3. We can get a batch of [4, 2] (batch_size, seq_len). During inference we can easily generate a result of size [4, 3] which will cause OOM.

This PR suggests one possible solution and is open for discussion.

@varisd varisd force-pushed the batching_workaround branch from 299c1bc to 0f5649c Compare February 21, 2019 16:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant