You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/3x/PT_MixedPrecision.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,7 +18,7 @@ The 4th Gen Intel® Xeon® Scalable processor supports FP16 instruction set arch
18
18
Further details can be found in the [Intel AVX512 FP16 Guide](https://www.intel.com/content/www/us/en/content-details/669773/intel-avx-512-fp16-instruction-set-for-intel-xeon-processor-based-products-technology-guide.html) published by Intel.
19
19
20
20
The latest Intel Xeon processors deliver flexibility of Intel Advanced Matrix Extensions (Intel AMX) ,an accelerator that improves the performance of deep learning(DL) training and inference, making it ideal for workloads like NLP, recommender systems, and image recognition. Developers can code AI functionality to take advantage of the Intel AMX instruction set, and they can code non-AI functionality to use the processor instruction set architecture (ISA). Intel has integrated the Intel® oneAPI Deep Neural Network Library (oneDNN), its oneAPI DL engine, into Pytorch.
21
-
Further details can be found in the [Intel AMX Document](https://www.intel.com/content/www/us/en/content-details/785250/accelerate-artificial-intelligence-ai-workloads-with-intel-advanced-matrix-extensions-intel-amx.html) published by Intel.
21
+
Further details can be found in the [Intel AMX Document](https://www.intel.com/content/www/us/en/content-details/785250/accelerate-artificial-intelligence-workloads-with-intel-advanced-matrix-extensions.html) published by Intel.
- Refer to this [example](https://github.com/intel/neural-compressor/blob/master/examples/onnxrt/body_analysis/onnx_model_zoo/ultraface/quantization/ptq_static) for how to define a customised dataloader.
96
+
- Refer to this [example](https://github.com/intel/neural-compressor/blob/master/examples/deprecated/onnxrt/body_analysis/onnx_model_zoo/ultraface/quantization/ptq_static) for how to define a customised dataloader.
97
97
98
-
- Refer to this [example](https://github.com/intel/neural-compressor/blob/master/examples/onnxrt/nlp/bert/quantization/ptq_static) for how to use internal dataloader.
98
+
- Refer to this [example](https://github.com/intel/neural-compressor/blob/master/examples/deprecated/onnxrt/nlp/bert/quantization/ptq_static) for how to use internal dataloader.
- Refer to this [example](https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/body_analysis/onnx_model_zoo/arcface/quantization/ptq_static) for how to define a customised metric.
127
+
- Refer to this [example](https://github.com/intel/neural-compressor/tree/master/examples/deprecated/onnxrt/body_analysis/onnx_model_zoo/arcface/quantization/ptq_static) for how to define a customised metric.
128
128
129
-
- Refer to this [example](https://github.com/intel/neural-compressor/blob/master/examples/tensorflow/image_recognition/tensorflow_models/efficientnet-b0/quantization/ptq) for how to use internal metric.
129
+
- Refer to this [example](https://github.com/intel/neural-compressor/tree/master/examples/deprecated/tensorflow/image_recognition/tensorflow_models/efficientnet-b0/quantization/ptq) for how to use internal metric.
Copy file name to clipboardExpand all lines: docs/source/pruning.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -107,7 +107,7 @@ Pruning patterns defines the rules of pruned weights' arrangements in space. Int
107
107
108
108
- Multi-head Attention Pruning
109
109
110
-
Multi-head attention mechanism boosts transformer models' capability of contextual information analysis. However, different heads' contribution to the final output varies. In most situation, a number of heads can be removed without causing accuracy drop. Head pruning can be applied in a wide range of scenes including BERT, GPT as well as other large language models. **We haven't support it in pruning, but we have provided experimental feature in Model Auto Slim**. Please refer to [multi-head attention auto slim examples](https://github.com/intel/neural-compressor/blob/master/examples/pytorch/nlp/huggingface_models/question-answering/model_slim)
110
+
Multi-head attention mechanism boosts transformer models' capability of contextual information analysis. However, different heads' contribution to the final output varies. In most situation, a number of heads can be removed without causing accuracy drop. Head pruning can be applied in a wide range of scenes including BERT, GPT as well as other large language models. **We haven't support it in pruning, but we have provided experimental feature in Model Auto Slim**. Please refer to [multi-head attention auto slim examples](https://github.com/intel/neural-compressor/blob/master/deprecated/examples/pytorch/nlp/huggingface_models/question-answering/model_slim)
To get more information, please refer to [examples](https://github.com/intel/neural-compressor/blob/master/examples/pytorch/nlp/huggingface_models/language-modeling/quantization/llm).
449
+
To get more information, please refer to [examples](https://github.com/intel/neural-compressor/blob/master/examples/deprecated/pytorch/nlp/huggingface_models/language-modeling/quantization/llm).
Copy file name to clipboardExpand all lines: neural_compressor/compression/pruner/README.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -107,7 +107,7 @@ Pruning patterns defines the rules of pruned weights' arrangements in space. Int
107
107
108
108
- Multi-head Attention Pruning
109
109
110
-
Multi-head attention mechanism boosts transformer models' capability of contextual information analysis. However, different heads' contribution to the final output varies. In most situation, a number of heads can be removed without causing accuracy drop. Head pruning can be applied in a wide range of scenes including BERT, GPT as well as other large language models. **We haven't support it in pruning, but we have provided experimental feature in Model Auto Slim**. Please refer to [multi-head attention auto slim examples](https://github.com/intel/neural-compressor/blob/master/examples/pytorch/nlp/huggingface_models/question-answering/model_slim)
110
+
Multi-head attention mechanism boosts transformer models' capability of contextual information analysis. However, different heads' contribution to the final output varies. In most situation, a number of heads can be removed without causing accuracy drop. Head pruning can be applied in a wide range of scenes including BERT, GPT as well as other large language models. **We haven't support it in pruning, but we have provided experimental feature in Model Auto Slim**. Please refer to [multi-head attention auto slim examples](https://github.com/intel/neural-compressor/blob/master/examples/deprecated/pytorch/nlp/huggingface_models/question-answering/model_slim)
0 commit comments