Skip to content

Conversation

@soumith
Copy link
Contributor

@soumith soumith commented Nov 6, 2017

This brings the rewritten, more efficient model from https://github.com/zackchase/mxnet-the-straight-dope/tree/master/chapter09_natural-language-processing

  • On 0.2 this is about 50% faster on CPU (OMP_NUM_THREADS=1)
  • On master this is about 3x faster on CPU (OMP_NUM_THREADS=1)
    On CUDA you'll see decent speedups as well.

This PR has to be carefully reviewed to make sure that the model before and after are doing the same thing (i've verified that the input / output shapes are all the same).

@dasguptar
Copy link
Owner

Hi @soumith 😮

Thanks for taking the time to send this PR. Unfortunately, after your previous PR, I kind of went back and looked at the model and realised I did many things in a dumb way.

Since this was the first model I had implemented in order to learn PyTorch, there was a bunch of unnecessary stuff, like F.torch.squeeze(tensor) instead of tensor.squeeze(). I went and tried to refactor the model myself yesterday, and optimised it a bit, reaching about ~2x speedup on CPU (5 minutes 30 seconds earlier to around 2 minutes 50 seconds now). I think some of the changes are similar to what you have done here, e.g. computing batched embeddings, combining linear layers, etc.

As of the latest commit, since the model file has changed, I cannot directly merge this PR, and I am quite inexperienced with rebasing and resolving conflicts. If and when you have the time, could you take a look at the current model.py, and decide whether to rebase the PR on current master, or if the current master is good enough?

# FC for i, f, u, o gates (N, 4*C), from input to hidden
i2h = F.linear(inputs, i2h_weight, i2h_bias)
i2h_slices = torch.split(i2h, i2h.size(1) // 4, dim=1) # (N, C)*4
i2h_iuo = torch.cat([i2h_slices[0], i2h_slices[2], i2h_slices[3]], dim=1) # (N, C*3)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are the indices 0,2,3 and not 0,1,2? Why is i2h_f_slice = i2h_slices[1]?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why is it iuo and not iou? Is there some rationale behind this? I am using iou so wondering if I am making some mistake...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants