Skip to content

Reduce the number of copy call in residual norm and allow pinned memory for criterion #1897

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

yhmtsai
Copy link
Member

@yhmtsai yhmtsai commented Jul 21, 2025

This PR uses master executor to allocate the one_changed + one additional workspace (for all_converged).
When the master executor uses device-specific host allocator like cudaMallocHost to allocate the pinned memory, we can get better bandwidth than default pageable memory. Also, it reduces the number of copy call in residual norm.

@yhmtsai yhmtsai requested a review from a team July 21, 2025 08:54
@yhmtsai yhmtsai self-assigned this Jul 21, 2025
@yhmtsai yhmtsai added 1:ST:ready-for-review This PR is ready for review 1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. labels Jul 21, 2025
@ginkgo-bot ginkgo-bot added reg:testing This is related to testing. reg:example This is related to the examples. type:solver This is related to the solvers type:stopping-criteria This is related to the stopping criteria mod:all This touches all Ginkgo modules. labels Jul 21, 2025
Copy link
Member

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for small nits! I would like to suggest that it would probably be safer if we don't have to leave a comment at every place where we set the flags, and instead put the two bool values into a struct. If that's too much work, we should put it into an issue somewhere 😄

@@ -41,12 +41,12 @@ __global__ __launch_bounds__(default_block_size) void residual_norm_kernel(
if (tidx < num_cols) {
if (tau[tidx] <= rel_residual_goal * orig_tau[tidx]) {
stop_status[tidx].converge(stoppingId, setFinalized);
device_storage[1] = true;
device_storage[0] = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put this into a struct or behind a wrapper to avoid mixing up the two values?

Comment on lines 227 to +228
exec->run(ir::make_initialize(&stop_status));
exec->run(ir::make_initialize(&stop_status));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate?

// use indicators in each critirion such that we can reduce the
// #copy_call
indicators[0] = false;
one_converged |=
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bitwise or, maybe it should be a logical one?

gko::array<gko::stopping_status> stop_status(this->exec_, 1);
stop_status.get_data()[0].reset();

EXPECT_TRUE(criterion->update().residual_norm(res_norm).check(
RelativeStoppingId, true, &stop_status, &one_changed));
RelativeStoppingId, true, &stop_status,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General thought: This boolean parameter doesn't tell us what it does. Maybe it could be an enum?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1:ST:need-feedback The PR is somewhat ready but feedback on a blocking topic is required before a proper review. 1:ST:ready-for-review This PR is ready for review mod:all This touches all Ginkgo modules. reg:example This is related to the examples. reg:testing This is related to testing. type:solver This is related to the solvers type:stopping-criteria This is related to the stopping criteria
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants