-
Notifications
You must be signed in to change notification settings - Fork 13.7k
CANN: Use smart pointers to manage ACL objects #17238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Previously, ACL objects were managed via manual destruction, which led to multiple memory-leak issues during runtime. This patch replaces manual memory management with smart pointers so that ACL objects are properly released and ownership is clearly defined. Note that the ownership of an ACL object belongs to the function that creates it. Other internal functions should operate on these ACL objects using raw pointers to avoid unintended ownership transfers. Additionally, since aclTensorList automatically frees its contained aclTensor objects, any aclTensor added to a tensor list must release ownership to avoid double free operations. This PR also removes the asynchronous task submission mechanism. Due to changes in recent CANN versions, tiling time has significantly decreased. Even with a dual-thread submission model, the dispatch overhead still falls on the critical path, making async submission less beneficial. Moreover, aclGraph support provides a much better path to reducing operator dispatch latency.
9efc21c to
8981848
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the CANN backend to replace manual memory management of ACL objects with smart pointers (std::unique_ptr with custom deleters), eliminating memory leaks and clarifying ownership semantics. Additionally, it removes the asynchronous task submission mechanism which is no longer beneficial due to CANN version improvements.
Key changes:
- Introduces smart pointer wrappers (
acl_tensor_ptr,acl_scalar_ptr,acl_int_array_ptr,acl_tensor_list_ptr) with custom deleters for automatic ACL resource cleanup - Removes task queue infrastructure (
cann_task_queue,cann_task,aclnn_task, etc.) - Replaces all manual
aclDestroy*calls with automatic cleanup via smart pointers - Replaces async helper functions with direct ACL API calls
- Removes the Doxyfile configuration (documentation generation file)
Reviewed Changes
Copilot reviewed 5 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| acl_tensor.h | Defines smart pointer types and custom deleters for ACL objects; adds factory functions returning smart pointers |
| acl_tensor.cpp | Implements factory functions returning smart pointers; removes unreachable return statement |
| aclnn_ops.h | Updates function signatures to use smart pointers; removes task queue classes and async helper functions |
| aclnn_ops.cpp | Converts all ACL object creation/usage to smart pointers; removes manual resource cleanup calls |
| ggml-cann.cpp | Updates tensor creation calls to use smart pointers; replaces async helpers with direct ACL calls; removes task queue usage |
| common.h | Removes task queue and async task infrastructure; updates constructor |
| Doxyfile | Removes Doxygen configuration file (2579 lines deleted) |
|
Test case passwd: 2701/2701 tests passed |
noemotiovon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your contribution! This PR is really exciting — we finally don’t have to manually release host-side ACL tensors anymore!
|
Additionally, I think we still need to verify that there are currently no memory leaks. |
5ad0e56 to
4dd3c07
Compare
I ran Valgrind on llama-server and didn’t find any leaks. There were a few false positives, though, coming from inside CANN. |
Previously, ACL objects were managed via manual destruction, which led to multiple memory-leak issues during runtime.
This patch replaces manual memory management with smart pointers so that ACL objects are properly released and ownership is clearly defined.
Note that the ownership of an ACL object belongs to the function that creates it.
Other internal functions should operate on these ACL objects using raw pointers to avoid unintended ownership transfers.
Additionally, since aclTensorList automatically frees its contained aclTensor objects, any aclTensor added to a tensor list must release ownership to avoid double free operations.
This PR also removes the asynchronous task submission mechanism.
Due to changes in recent CANN versions, tiling time has significantly decreased. Even with a dual-thread submission model, the dispatch overhead still falls on the critical path, making async submission less beneficial.
Moreover, aclGraph support provides a much better path to reducing operator dispatch latency.
Make sure to read the contributing guidelines before submitting a PR