You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"<h2>GLU Variants</h2>\n<p>These are variants with gated hidden layers for the FFN as introduced in paper <a href=\"https://arxiv.org/abs/2002.05202\">GLU Variants Improve Transformer</a>. We have omitted the bias terms as specified in the paper. </p>\n": "<h2>GLU \u53d8\u4f53</h2>\n<p>\u8fd9\u4e9b\u662f\u7528\u4e8eFFN\u7684\u5c01\u95ed\u9690\u85cf\u5c42\u7684\u53d8\u4f53\uff0c\u5982\u7eb8\u8d28 <a href=\"https://arxiv.org/abs/2002.05202\">GLU\u53d8\u4f53\u6539\u8fdb\u53d8\u538b\u5668</a>\u4e2d\u6240\u8ff0\u3002\u6211\u4eec\u7701\u7565\u4e86\u672c\u6587\u4e2d\u6307\u5b9a\u7684\u504f\u5dee\u672f\u8bed\u3002</p>\n",
"<h2>GLU Variants</h2>\n<p>These are variants with gated hidden layers for the FFN as introduced in paper <a href=\"https://arxiv.org/abs/2002.05202\">GLU Variants Improve Transformer</a>. We have omitted the bias terms as specified in the paper. </p>\n": "<h2>GLU \u53d8\u4f53</h2>\n<p>\u8fd9\u4e9b\u662f\u5728\u8bba\u6587 <a href=\"https://arxiv.org/abs/2002.05202\">\u300aGLU Variants Improve Transformer \u300b</a>\u4e2d\u5305\u542b\u7684\u5404\u79cd\u5e26\u95e8\u63a7\u9690\u85cf\u5c42\u7684 ffn \u53d8\u4f53\u3002\u6211\u4eec\u5df2\u6309\u7167\u8bba\u6587\u89c4\u5b9a\u7701\u7565\u4e86\u504f\u7f6e\u9879\u3002</p>\n",
4
4
"<h3>FFN with Bilinear hidden layer</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>\u5e26\u53cc\u7ebf\u6027\u9690\u85cf\u5c42\u7684 FFN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
"<h3>FFN with Gated Linear Units</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>\u5e26\u95e8\u63a7\u7ebf\u6027\u5355\u5143\u7684 FFN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
"<h3>FFN with Swish gate</h3>\n<p><span translate=no>_^_0_^_</span> where <span translate=no>_^_1_^_</span> </p>\n": "<h3>\u5e26 Swish \u95e8\u7684 FFN</h3>\n<p><span translate=no>_^_0_^_</span>\u5176\u4e2d\uff0c<span translate=no>_^_1_^_</span></p>\n",
9
9
"<h3>Fixed Positional Embeddings</h3>\n<p>Source embedding with fixed positional encodings</p>\n": "<h3>\u56fa\u5b9a\u4f4d\u7f6e\u5d4c\u5165</h3>\n<p>\u4f7f\u7528\u56fa\u5b9a\u4f4d\u7f6e\u7f16\u7801\u8fdb\u884c\u6e90\u5d4c\u5165</p>\n",
10
-
"<h3>GELU activation</h3>\n<p><span translate=no>_^_0_^_</span> where <span translate=no>_^_1_^_</span></p>\n<p>It was introduced in paper <a href=\"https://arxiv.org/abs/1606.08415\">Gaussian Error Linear Units</a>.</p>\n": "<h3>GELU \u6fc0\u6d3b</h3>\n<p><span translate=no>_^_0_^_</span>\u5728\u54ea\u91cc<span translate=no>_^_1_^_</span></p>\n<p>\u5b83\u662f\u5728\u8bba\u6587\u4e2d\u4ecb\u7ecd\u7684\u201c<a href=\"https://arxiv.org/abs/1606.08415\">\u9ad8\u65af\u8bef\u5dee\u7ebf\u6027\u5355\u4f4d</a>\u201d\u3002</p>\n",
11
-
"<h3>Learned Positional Embeddings</h3>\n<p>Source embedding with learned positional encodings</p>\n": "<h3>\u5b66\u4e60\u8fc7\u7684\u4f4d\u7f6e\u5d4c\u5165</h3>\n<p>\u4f7f\u7528\u5b66\u4e60\u7684\u4f4d\u7f6e\u7f16\u7801\u8fdb\u884c\u6e90\u5d4c\u5165</p>\n",
"<h3>GELU activation</h3>\n<p><span translate=no>_^_0_^_</span> where <span translate=no>_^_1_^_</span></p>\n<p>It was introduced in paper <a href=\"https://arxiv.org/abs/1606.08415\">Gaussian Error Linear Units</a>.</p>\n": "<h3>GELU \u6fc0\u6d3b\u51fd\u6570</h3>\n<p><span translate=no>_^_0_^_</span>\u5176\u4e2d\uff0c<span translate=no>_^_1_^_</span></p>\n<p>\u8fd9\u662f\u5728\u8bba\u6587<a href=\"https://arxiv.org/abs/1606.08415\">\u300a Gaussian Error Linear Units \u300b</a>\u4e2d\u4ecb\u7ecd\u7684\u3002</p>\n",
11
+
"<h3>Learned Positional Embeddings</h3>\n<p>Source embedding with learned positional encodings</p>\n": "<h3>\u53ef\u5b66\u4e60\u7684\u4f4d\u7f6e\u5d4c\u5165</h3>\n<p>\u4f7f\u7528\u53ef\u5b66\u4e60\u7684\u4f4d\u7f6e\u7f16\u7801\u8fdb\u884c\u5d4c\u5165</p>\n",
"<p> <a id=\"FFN\"></a></p>\n<h2>FFN Configurations</h2>\n<p>Creates a Position-wise FeedForward Network defined in <a href=\"feed_forward.html\"><span translate=no>_^_0_^_</span></a>.</p>\n": "<p><a id=\"FFN\"></a></p>\n<h2>FFN \u914d\u7f6e</h2>\n<p>\u521b\u5efa\u5728\u4e2d\u5b9a\u4e49\u7684\u4f4d\u7f6e\u524d\u9988\u7f51\u7edc<a href=\"feed_forward.html\"><span translate=no>_^_0_^_</span></a>\u3002</p>\n",
17
-
"<p> <a id=\"TransformerConfigs\"></a></p>\n<h2>Transformer Configurations</h2>\n<p>This defines configurations for a transformer. The configurations are calculate using option functions. These are lazy loaded and therefore only the necessary modules are calculated.</p>\n": "<p><a id=\"TransformerConfigs\"></a></p>\n<h2>\u53d8\u538b\u5668\u914d\u7f6e</h2>\n<p>\u8fd9\u5b9a\u4e49\u4e86\u53d8\u538b\u5668\u7684\u914d\u7f6e\u3002\u914d\u7f6e\u662f\u4f7f\u7528\u9009\u9879\u51fd\u6570\u8ba1\u7b97\u7684\u3002\u8fd9\u4e9b\u662f\u5ef6\u8fdf\u52a0\u8f7d\u7684\uff0c\u56e0\u6b64\u53ea\u8ba1\u7b97\u5fc5\u8981\u7684\u6a21\u5757\u3002</p>\n",
16
+
"<p> <a id=\"FFN\"></a></p>\n<h2>FFN Configurations</h2>\n<p>Creates a Position-wise FeedForward Network defined in <a href=\"feed_forward.html\"><span translate=no>_^_0_^_</span></a>.</p>\n": "<p><a id=\"FFN\"></a></p>\n<h2>FFN \u914d\u7f6e</h2>\n<p>\u5728<a href=\"feed_forward.html\"><span translate=no>_^_0_^_</span></a>\u4e2d\u5b9a\u4e49\u4e86\u4e00\u4e2a\u4f4d\u7f6e\u524d\u9988\u7f51\u7edc\u3002</p>\n",
17
+
"<p> <a id=\"TransformerConfigs\"></a></p>\n<h2>Transformer Configurations</h2>\n<p>This defines configurations for a transformer. The configurations are calculate using option functions. These are lazy loaded and therefore only the necessary modules are calculated.</p>\n": "<p><a id=\"TransformerConfigs\"></a></p>\n<h2>Transformer \u914d\u7f6e</h2>\n<p>\u8fd9\u5b9a\u4e49\u4e86 Transformer \u7684\u914d\u7f6e\u3002\u8fd9\u4e9b\u914d\u7f6e\u662f\u901a\u8fc7\u53ef\u9009\u62e9\u7684\u51fd\u6570\u8fdb\u884c\u8ba1\u7b97\u7684\u3002\u5b83\u4eec\u662f\u60f0\u6027\u52a0\u8f7d\u7684\uff0c\u56e0\u6b64\u53ea\u6709\u5fc5\u8981\u7684\u6a21\u5757\u624d\u4f1a\u88ab\u8ba1\u7b97\u3002</p>\n",
"<p>Logit generator for prediction </p>\n": "<p>\u7528\u4e8e\u9884\u6d4b\u7684 Logit \u751f\u6210\u5668</p>\n",
38
-
"<p>Number of attention heads </p>\n": "<p>\u6ce8\u610f\u5934\u6570\u91cf</p>\n",
39
-
"<p>Number of features in in the hidden layer </p>\n": "<p>\u9690\u85cf\u56fe\u5c42\u4e2d\u7684\u8981\u7d20\u6570\u91cf</p>\n",
40
-
"<p>Number of features in the embedding </p>\n": "<p>\u5d4c\u5165\u4e2d\u7684\u8981\u7d20\u6570\u91cf</p>\n",
38
+
"<p>Number of attention heads </p>\n": "<p>\u6ce8\u610f\u529b\u5934\u6570\u91cf</p>\n",
39
+
"<p>Number of features in in the hidden layer </p>\n": "<p>\u9690\u85cf\u5c42\u4e2d\u7684\u7279\u5f81\u6570\u91cf</p>\n",
40
+
"<p>Number of features in the embedding </p>\n": "<p>\u5d4c\u5165\u7684\u7279\u5f81\u6570\u91cf</p>\n",
41
41
"<p>Number of layers </p>\n": "<p>\u5c42\u6570</p>\n",
42
-
"<p>Number of tokens in the source vocabulary (for token embeddings) </p>\n": "<p>\u6e90\u8bcd\u6c47\u8868\u4e2d\u7684\u6807\u8bb0\u6570\uff08\u7528\u4e8e\u4ee4\u724c\u5d4c\u5165\uff09</p>\n",
43
-
"<p>Number of tokens in the target vocabulary (to generate logits for prediction) </p>\n": "<p>\u76ee\u6807\u8bcd\u6c47\u8868\u4e2d\u7684\u6807\u8bb0\u6570\uff08\u7528\u4e8e\u751f\u6210\u9884\u6d4b\u7684\u5bf9\u6570\uff09</p>\n",
42
+
"<p>Number of tokens in the source vocabulary (for token embeddings) </p>\n": "<p>\u6e90\u8bcd\u6c47\u8868\u4e2d\u7684 token \u6570\u91cf\uff08\u7528\u4e8e token \u5d4c\u5165\uff09</p>\n",
43
+
"<p>Number of tokens in the target vocabulary (to generate logits for prediction) </p>\n": "<p>\u76ee\u6807\u8bcd\u6c47\u8868\u4e2d\u7684 token \u6570\u91cf\uff08\u7528\u4e8e\u751f\u6210\u9884\u6d4b\u7684 logits \uff09</p>\n",
"<p>Whether the FFN layer should be gated </p>\n": "<p>\u662f\u5426\u5e94\u5bf9 FFN \u5c42\u8fdb\u884c\u95e8\u63a7</p>\n",
51
-
"<p>Whether the first fully connected layer should have a learnable bias </p>\n": "<p>\u7b2c\u4e00\u4e2a\u5b8c\u5168\u8fde\u63a5\u7684\u5c42\u662f\u5426\u5e94\u8be5\u6709\u53ef\u5b66\u4e60\u7684\u504f\u5dee</p>\n",
52
-
"<p>Whether the fully connected layer for the gate should have a learnable bias </p>\n": "<p>\u6805\u6781\u7684\u5168\u8fde\u63a5\u5c42\u662f\u5426\u5e94\u5177\u6709\u53ef\u5b66\u4e60\u7684\u504f\u5dee</p>\n",
53
-
"<p>Whether the second fully connected layer should have a learnable bias </p>\n": "<p>\u7b2c\u4e8c\u4e2a\u5168\u8fde\u63a5\u5c42\u662f\u5426\u5e94\u8be5\u6709\u53ef\u5b66\u4e60\u7684\u504f\u5dee</p>\n",
"<p>Whether the first fully connected layer should have a learnable bias </p>\n": "<p>\u7b2c\u4e00\u4e2a\u5168\u8fde\u63a5\u5c42\u662f\u5426\u5177\u6709\u53ef\u5b66\u4e60\u7684\u504f\u7f6e</p>\n",
52
+
"<p>Whether the fully connected layer for the gate should have a learnable bias </p>\n": "<p>\u95e8\u63a7\u7684\u5168\u8fde\u63a5\u5c42\u662f\u5426\u5177\u6709\u53ef\u5b66\u4e60\u7684\u504f\u7f6e</p>\n",
53
+
"<p>Whether the second fully connected layer should have a learnable bias </p>\n": "<p>\u7b2c\u4e8c\u4e2a\u5168\u8fde\u63a5\u5c42\u662f\u5426\u5177\u6709\u53ef\u5b66\u4e60\u7684\u504f\u7f6e</p>\n",
"These are configurable components that can be re-used quite easily.": "\u8fd9\u4e9b\u662f\u53ef\u914d\u7f6e\u7684\u7ec4\u4ef6\uff0c\u53ef\u4ee5\u5f88\u5bb9\u6613\u5730\u91cd\u590d\u4f7f\u7528\u3002"
0 commit comments