Is your feature request related to a problem? Please describe.
Related to #19707
The multithreaded loop in compute data page mask can be accelerated using a (simple) custom CUDA kernel. Doing so will also allow us to avoid copying row_mask column data to the host.
Describe the solution you'd like
GPU-accelerated algorithm for compute_data_page_mask
Describe alternatives you've considered
CPU multithreaded solution implemented in #19707
Additional context
Originally posted by @mhaseeb123 in https://github.com/rapidsai/cudf/pull/19602/files#r2286725034