Skip to content

Commit 09505bb

Browse files
authored
[Doc] Release Note and Known Issue List for IPEX XPU 2.8 (#5745)
1 parent c6427e5 commit 09505bb

File tree

2 files changed

+41
-20
lines changed

2 files changed

+41
-20
lines changed

docs/tutorials/known_issues.md

Lines changed: 3 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -81,26 +81,9 @@ Troubleshooting
8181
- **Cause**: Not activate C++ compiler. `torch.compile` need to find correct `cl.exe` path.
8282
- **Solution**: One could open "Developer Command Prompt for VS 2022" or follow [Visual Studio Developer Command Prompt and Developer PowerShell](https://learn.microsoft.com/en-us/visualstudio/ide/reference/command-prompt-powershell?view=vs-2022#developer-command-prompt) to activate visual studio environment.
8383

84-
- **Problem**: LoweringException: ImportError: cannot import name 'intel' from 'triton._C.libtriton'
85-
- **Cause**: Installing Triton causes pytorch-triton-xpu to stop working.
86-
- **Solution**: Resolve the issue with following command:
87-
88-
```bash
89-
pip list | grep triton
90-
# If triton related packages are listed, remove them
91-
pip uninstall triton
92-
pip uninstall pytorch-triton-xpu
93-
# Reinstall correct version of pytorch-triton-xpu
94-
pip install pytorch-triton-xpu==3.3.0 --index-url https://download.pytorch.org/whl/xpu
95-
```
96-
97-
- **Problem**: RuntimeError: oneCCL: ze_handle_manager.cpp:226 get_ptr: EXCEPTION: unknown memory type, when executing DLRMv2 BF16 training on 4 cards Intel® Data Center GPU Max platform.
98-
- **Cause**: Issue exists in the default sycl path of oneCCL 2021.14 which uses two IPC exchanges.
99-
- **Solution**: Use `export CCL_ATL_TRANSPORT=ofi` to work around.
100-
101-
- **Problem**: Segmentation fault, when executing LLaMa2-70B inference on Intel® Data Center GPU Max platform, base on online quantization.
102-
- **Cause**: Issue exists Intel Neural Compressor (INC) v3.3: during the initial import of INC, the accelerator is cached with `lru_cache`. Subsequently, setting `INC_TARGET_DEVICE` in INC transformers-like API does not take effect. This results in two devices being present in the model, leading to memory-related errors as seen in the error messages.
103-
- **Solution**: Run the workload `INC_TARGET_DEVICE="cpu" python` to work around, if using online quantization.
84+
- **Problem**: If you encounter a system hang issue when executing llama3-8b and phi3-mini FSDP fine-tuning cases based on XCCL backend on Intel® Data Center GPU Max platform, and the hang occurs after workload completion and before the process exits.
85+
- **Cause**: Compatibility issue between accelerate v1.8.1 and transformers v4.51.3.
86+
- **Solution**: Use torch-ccl to replace XCCL to workaround.
10487

10588
## Performance Issue
10689

docs/tutorials/releases.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,44 @@
11
Releases
22
=============
33

4+
We launched Intel® Extension for PyTorch\* in 2020 with the goal of extending the official PyTorch\* to simplify achieving high performance on Intel® CPU and GPU platforms. Over the years, we have successfully upstreamed most of our features and optimizations for Intel® platforms into PyTorch\*. Moving forward, our strategy is to focus on developing new features and supporting upcoming platform launches directly within PyTorch\*. We are discontinuing active development on Intel® Extension for PyTorch\*, effective immediately after 2.8 release. We will continue to provide critical bug fixes and security patches throughout the PyTorch\* 2.9 timeframe to ensure a smooth transition for our partners and the community.
5+
6+
## 2.8.10+xpu
7+
8+
Intel® Extension for PyTorch\* v2.8.10+xpu is the new release which supports Intel® GPU platforms (Intel® Arc™ Graphics family, Intel® Core™ Ultra Processors with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 Mobile Processors and Intel® Data Center GPU Max Series) based on PyTorch\* 2.8.0.
9+
10+
### Highlights
11+
12+
- Intel® oneDNN v3.8.1 integration
13+
- Intel® Deep Learning Essentials 2025.1.3 compatibility
14+
- Large Language Model (LLM) optimization
15+
16+
Intel® Extension for PyTorch\* optimizes the performance of Qwen3, along with other typical LLM models on Intel® GPU platforms,with the supported transformer version upgraded to [4.51.3](https://github.com/huggingface/transformers/releases/tag/v4.51.3). A full list of optimized LLM models is available in the [LLM Optimizations Overview](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/llm.html). Intel® Extension for PyTorch\* also adds the support for more custom kernels, such as `selective_scan_fn`, `causal_conv1d_fn` and `causal_conv1d_update`, for the functionality support of [Jamba](https://arxiv.org/abs/2403.19887) model.
17+
18+
- PyTorch\* XCCL adoption for distributed scenarios
19+
20+
Intel® Extension for PyTorch\* adopts the PyTorch\* XCCL backend for distrubuted scenarios on the Intel® GPU platform. We observed that the scaling performance using PyTorch\* XCCL is on par with OneCCL Bindings for PyTorch\* (torch-ccl) for validated AI workloads. As a result, we will discontinue active development of torch-ccl immediately after the 2.8 release.
21+
22+
A pseudocode example illustrating the transition from torch-ccl to PyTorch\* XCCL at the model script level is shown below:
23+
24+
```
25+
import torch
26+
27+
if torch.distributed.is_xccl_available:
28+
torch.distributed.init_process_group(backend='xccl')
29+
else:
30+
import oneccl_bindings_for_pytorch
31+
torch.distributed.init_process_group(backend='ccl')
32+
```
33+
34+
- Redundant code removal
35+
36+
Intel® Extension for PyTorch\* no longer overrides the device allocator. It is recommended to use the allocator provided by PyTorch\* instead. Intel® Extension for PyTorch\* also removes all overridden oneMKL and oneDNN related operators except GEMM and SDPA.
37+
38+
### Known Issues
39+
40+
Please refer to [Known Issues webpage](./known_issues.md).
41+
442
## 2.7.10+xpu
543
644
Intel® Extension for PyTorch\* v2.7.10+xpu is the new release which supports Intel® GPU platforms (Intel® Arc™ Graphics family, Intel® Core™ Ultra Processors with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 with Intel® Arc™ Graphics, Intel® Core™ Ultra Series 2 Mobile Processors and Intel® Data Center GPU Max Series) based on PyTorch\* 2.7.0.

0 commit comments

Comments
 (0)