-
Notifications
You must be signed in to change notification settings - Fork 97
Reduce the number of copy call in residual norm and allow pinned memory for criterion #1897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for small nits! I would like to suggest that it would probably be safer if we don't have to leave a comment at every place where we set the flags, and instead put the two bool
values into a struct
. If that's too much work, we should put it into an issue somewhere 😄
@@ -41,12 +41,12 @@ __global__ __launch_bounds__(default_block_size) void residual_norm_kernel( | |||
if (tidx < num_cols) { | |||
if (tau[tidx] <= rel_residual_goal * orig_tau[tidx]) { | |||
stop_status[tidx].converge(stoppingId, setFinalized); | |||
device_storage[1] = true; | |||
device_storage[0] = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we put this into a struct or behind a wrapper to avoid mixing up the two values?
exec->run(ir::make_initialize(&stop_status)); | ||
exec->run(ir::make_initialize(&stop_status)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
duplicate?
// use indicators in each critirion such that we can reduce the | ||
// #copy_call | ||
indicators[0] = false; | ||
one_converged |= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bitwise or, maybe it should be a logical one?
gko::array<gko::stopping_status> stop_status(this->exec_, 1); | ||
stop_status.get_data()[0].reset(); | ||
|
||
EXPECT_TRUE(criterion->update().residual_norm(res_norm).check( | ||
RelativeStoppingId, true, &stop_status, &one_changed)); | ||
RelativeStoppingId, true, &stop_status, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General thought: This boolean parameter doesn't tell us what it does. Maybe it could be an enum?
This PR uses master executor to allocate the one_changed + one additional workspace (for all_converged).
When the master executor uses device-specific host allocator like
cudaMallocHost
to allocate the pinned memory, we can get better bandwidth than default pageable memory. Also, it reduces the number of copy call in residual norm.