Skip to content

Conversation

@hipudding
Copy link
Collaborator

@hipudding hipudding commented Nov 13, 2025

Previously, ACL objects were managed via manual destruction, which led to multiple memory-leak issues during runtime.
This patch replaces manual memory management with smart pointers so that ACL objects are properly released and ownership is clearly defined.

Note that the ownership of an ACL object belongs to the function that creates it.
Other internal functions should operate on these ACL objects using raw pointers to avoid unintended ownership transfers.

Additionally, since aclTensorList automatically frees its contained aclTensor objects, any aclTensor added to a tensor list must release ownership to avoid double free operations.

This PR also removes the asynchronous task submission mechanism.
Due to changes in recent CANN versions, tiling time has significantly decreased. Even with a dual-thread submission model, the dispatch overhead still falls on the critical path, making async submission less beneficial.
Moreover, aclGraph support provides a much better path to reducing operator dispatch latency.

Make sure to read the contributing guidelines before submitting a PR

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Ascend NPU issues specific to Ascend NPUs labels Nov 13, 2025
Previously, ACL objects were managed via manual destruction, which
led to multiple memory-leak issues during runtime. This patch replaces
manual memory management with smart pointers so that ACL objects
are properly released and ownership is clearly defined.

Note that the ownership of an ACL object belongs to the function
that creates it. Other internal functions should operate on these ACL
objects using raw pointers to avoid unintended ownership transfers.

Additionally, since aclTensorList automatically frees its contained
aclTensor objects, any aclTensor added to a tensor list must release
ownership to avoid double free operations.

This PR also removes the asynchronous task submission mechanism.
Due to changes in recent CANN versions, tiling time has significantly
decreased. Even with a dual-thread submission model, the dispatch
overhead still falls on the critical path, making async submission
less beneficial. Moreover, aclGraph support provides a much better
path to reducing operator dispatch latency.
@hipudding hipudding changed the title CANN: use unique_ptr for Ascend Tensors CANN: Use smart pointers to manage ACL objects Nov 14, 2025
@hipudding hipudding marked this pull request as ready for review November 14, 2025 06:02
@hipudding hipudding self-assigned this Nov 14, 2025
Copilot finished reviewing on behalf of hipudding November 14, 2025 06:07
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the CANN backend to replace manual memory management of ACL objects with smart pointers (std::unique_ptr with custom deleters), eliminating memory leaks and clarifying ownership semantics. Additionally, it removes the asynchronous task submission mechanism which is no longer beneficial due to CANN version improvements.

Key changes:

  • Introduces smart pointer wrappers (acl_tensor_ptr, acl_scalar_ptr, acl_int_array_ptr, acl_tensor_list_ptr) with custom deleters for automatic ACL resource cleanup
  • Removes task queue infrastructure (cann_task_queue, cann_task, aclnn_task, etc.)
  • Replaces all manual aclDestroy* calls with automatic cleanup via smart pointers
  • Replaces async helper functions with direct ACL API calls
  • Removes the Doxyfile configuration (documentation generation file)

Reviewed Changes

Copilot reviewed 5 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
acl_tensor.h Defines smart pointer types and custom deleters for ACL objects; adds factory functions returning smart pointers
acl_tensor.cpp Implements factory functions returning smart pointers; removes unreachable return statement
aclnn_ops.h Updates function signatures to use smart pointers; removes task queue classes and async helper functions
aclnn_ops.cpp Converts all ACL object creation/usage to smart pointers; removes manual resource cleanup calls
ggml-cann.cpp Updates tensor creation calls to use smart pointers; replaces async helpers with direct ACL calls; removes task queue usage
common.h Removes task queue and async task infrastructure; updates constructor
Doxyfile Removes Doxygen configuration file (2579 lines deleted)

@hipudding
Copy link
Collaborator Author

Test case passwd:

2701/2701 tests passed
Backend CANN0: OK
Backend 2/3: CANN1
Skipping
Backend 3/3: CPU
Skipping
3/3 backends

Copy link
Collaborator

@noemotiovon noemotiovon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! This PR is really exciting — we finally don’t have to manually release host-side ACL tensors anymore!

@noemotiovon
Copy link
Collaborator

Additionally, I think we still need to verify that there are currently no memory leaks.

@hipudding
Copy link
Collaborator Author

Additionally, I think we still need to verify that there are currently no memory leaks.

I ran Valgrind on llama-server and didn’t find any leaks. There were a few false positives, though, coming from inside CANN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Ascend NPU issues specific to Ascend NPUs ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants