Skip to content

Conversation

@Przemog1
Copy link
Contributor

No description provided.

@devshgraphicsprogramming
Copy link
Member

@Przemog1 the RWMC stuff should be neatly packaged up and go into nbl/builtin/hlsl/rwmc

@Przemog1 Przemog1 changed the title Updated examples RWMC Oct 18, 2025
}

// most of this code is stolen from https://cg.ivd.kit.edu/publications/2018/rwmc/tool/split.cpp
void addSample(uint32_t sampleIndex, float32_t3 sample)
Copy link
Member

@devshgraphicsprogramming devshgraphicsprogramming Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample is a reserved keyword, use _sample (see the highlight

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why float32t_3 here and not CascadeLayerType

Comment on lines +16 to +34
uint32_t size;
uint32_t start;
uint32_t base;
};

template<typename CascadeLayerType, uint32_t CascadeSize>
struct CascadeEntry
{
CascadeLayerType data[CascadeSize];
};

template<typename CascadeLayerType, uint32_t CascadeSize>
struct CascadeAccumulator
{
using output_storage_type = CascadeEntry<CascadeLayerType, CascadeSize>;
using initialization_data = CascadeSettings;
output_storage_type accumulation;
uint32_t cascadeSampleCounter[CascadeSize];
CascadeSettings cascadeSettings;
Copy link
Member

@devshgraphicsprogramming devshgraphicsprogramming Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CascadeSize (which should be called CascadeCount) is already a template parameter, why is there a CascadeSettings::size ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also separate the aliases from member declarations (even better put member declarations at the bottom of struct) so its easier to read

Comment on lines +3 to +5
#include "nbl/builtin/hlsl/cpp_compat.hlsl"
#include <nbl/builtin/hlsl/vector_utils/vector_traits.hlsl>
#include <nbl/builtin/hlsl/colorspace/encodeCIEXYZ.hlsl>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use <> for HLSL includes in the Nabla headers (prioritizes looking in builtin DLL / most important search paths before doing anything local)

uint32_t cascadeSampleCounter[CascadeSize];
CascadeSettings cascadeSettings;

void initialize(in CascadeSettings settings)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static create maybe better as a factory?

Also use NBL_CONST_REF_ARG get out of habit of using in/out/inout in HLSL

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idk i like initialize better as it indicates that CascadeAccumulator has to be initialized. static create just looks like plain constructor and we have no option to disable default constructor forcing user to use this one.

{
for (int i = 0; i < CascadeSize; ++i)
{
accumulation.data[i] = (CascadeLayerType)0.0f;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

promote

uint32_t base;
};

template<typename CascadeLayerType, uint32_t CascadeSize>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

require the LayerType just be a vector

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw if you want we can nest this struct inside the CascadeAccumulator because nobody will ever use it in an SSBO or Texture as is

Comment on lines +44 to +46
cascadeSettings.size = settings.size;
cascadeSettings.start = settings.start;
cascadeSettings.base = settings.base;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you don't need to assign member by member, struct to struct assignment works just fine


uint32_t higherCascadeIndex = lowerCascadeIndex + 1u;

const uint32_t sampleCount = sampleIndex + 1u;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take sampleCount from the outside already, instead of sampleIndex

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i don't get it

cascadeSettings.base = settings.base;
}

typename vector_traits<CascadeLayerType>::scalar_type getLuma(NBL_CONST_REF_ARG(CascadeLayerType) col)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make a using alias for the typename vector_traits<CascadeLayerType>::scalar_type

Comment on lines +16 to +18
uint32_t size;
uint32_t start;
uint32_t base;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need completely different members, see the old GLSL code

Also put the structs in separate headers (so easier to share with C++ without sharing the accumulator and resolve):

  • splatting parameters
  • resolve parameters
    etc.

Comment on lines +57 to +60
float lowerScale = cascadeSettings.start;
float upperScale = lowerScale * cascadeSettings.base;

const float luma = getLuma(sample);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of float use the alias https://github.com/Devsh-Graphics-Programming/Nabla/pull/936/files#r2473742745 for vector_traits<CascadeLayerType>::scalar_type

Comment on lines +62 to +68
uint32_t lowerCascadeIndex = 0u;
while (!(luma < upperScale) && lowerCascadeIndex < cascadeSettings.size - 2)
{
lowerScale = upperScale;
upperScale *= cascadeSettings.base;
++lowerCascadeIndex;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can get the floating point cascade index by dealing with logarithms

clamp((log2(luma) - log2(Start))*1.f/log2(Base), 0 , CascadeSize-1)

this is why I say that the Splatting Parameters need to be precomputing different things, like log2_start, and rcp_log2_base

Comment on lines +62 to +78
uint32_t lowerCascadeIndex = 0u;
while (!(luma < upperScale) && lowerCascadeIndex < cascadeSettings.size - 2)
{
lowerScale = upperScale;
upperScale *= cascadeSettings.base;
++lowerCascadeIndex;
}

float lowerCascadeLevelWeight;
float higherCascadeLevelWeight;

if (luma <= lowerScale)
lowerCascadeLevelWeight = 1.0f;
else if (luma < upperScale)
lowerCascadeLevelWeight = max(0.0f, (lowerScale / luma - lowerScale / upperScale) / (1.0f - lowerScale / upperScale));
else // Inf, NaN ...
lowerCascadeLevelWeight = 0.0f;
Copy link
Member

@devshgraphicsprogramming devshgraphicsprogramming Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can get the floating point cascade index by dealing with logarithms

const scalar_t log2Luma= log2<scalar_t>(luma);
const scalar_t cascade = max(log2Luma*1.f/log2(Base) - log2(Start)/log2(Base), 0, CascadeCount-1);
const scalar_t clampedCascade = clamp(cascade, 0, CascadeCount-1);
// c<=0 -> 0, c>=Count-1 -> Count-1 
uint16_t lowerCascadeIndex = floor<scalar_t>(clampedCascade);
// 0 whenever clamped or `cascade` is integer (when `clampedCascade` is integer)
scalar_t higherCascadeWeight = cascade-floor<scalar_t>(clampedCascade);
// never 0 thanks to magic of `1-fract(x)`
scalar_t lowerCascadeWeight = scalar_type(1)-higherCascadeWeight;

// handle super bright sample case
if (cascade>CascadeCount-1)
   lowerCascadeWeight = exp2(log2(Start)+log2(Base)*(CascadeCount-1)-log2Luma);

// only deal with splatting the higher cascade if you're not out of bound
// when out of bounds negatively, we use lower cascade as the only cascade (follow the math above)
if (clampedCascade==cascade) // equivalent condition is `higherCascadeWeight!=0`
{
   const uint16_t higherCascadeIndex = lowerCascadeIndex+1;
   // do stuff for higher cascade
}

this is why I say that the Splatting Parameters need to be precomputing different things, like logBaseStart, rcpLog2Base, and log2CascadeEnd

using output_storage_type = CascadeEntry<CascadeLayerType, CascadeSize>;
using initialization_data = CascadeSettings;
output_storage_type accumulation;
uint32_t cascadeSampleCounter[CascadeSize];

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stick sample count into the CascadeEntry

Comment on lines +89 to +92
accumulation.data[lowerCascadeIndex] += (sample * lowerCascadeLevelWeight - (sampleCount - (cascadeSampleCounter[lowerCascadeIndex])) * accumulation.data[lowerCascadeIndex]) * reciprocalSampleCount;
accumulation.data[higherCascadeIndex] += (sample * higherCascadeLevelWeight - (sampleCount - (cascadeSampleCounter[higherCascadeIndex])) * accumulation.data[higherCascadeIndex]) * reciprocalSampleCount;
cascadeSampleCounter[lowerCascadeIndex] = sampleCount;
cascadeSampleCounter[higherCascadeIndex] = sampleCount;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really long in a line and reads badly, could be turned into a method you call per cascade.

Also remove the extra () around cascadeSampleCounter[...]

Comment on lines +13 to +63
namespace impl
{

struct CascadeSample
{
float32_t3 centerValue;
float normalizedCenterLuma;
float normalizedNeighbourhoodAverageLuma;
};

// TODO: figure out what values should pixels outside have, 0.0f is incorrect
float32_t3 sampleCascadeTexel(int32_t2 currentCoord, int32_t2 offset, in RWTexture2DArray<float32_t4> cascade, uint32_t cascadeIndex)
{
const int32_t2 texelCoord = currentCoord + offset;
if (any(texelCoord < int32_t2(0, 0)))
return float32_t3(0.0f, 0.0f, 0.0f);

float32_t4 output = cascade.Load(int32_t3(texelCoord, int32_t(cascadeIndex)));
return float32_t3(output.r, output.g, output.b);
}

float32_t calcLuma(in float32_t3 col)
{
return hlsl::dot<float32_t3>(hlsl::transpose(colorspace::scRGBtoXYZ)[1], col);
}

CascadeSample SampleCascade(in int32_t2 coord, in RWTexture2DArray<float32_t4> cascade, in uint cascadeIndex, in float reciprocalBaseI)
{
float32_t3 neighbourhood[9];
neighbourhood[0] = sampleCascadeTexel(coord, int32_t2(-1, -1), cascade, cascadeIndex);
neighbourhood[1] = sampleCascadeTexel(coord, int32_t2(0, -1), cascade, cascadeIndex);
neighbourhood[2] = sampleCascadeTexel(coord, int32_t2(1, -1), cascade, cascadeIndex);
neighbourhood[3] = sampleCascadeTexel(coord, int32_t2(-1, 0), cascade, cascadeIndex);
neighbourhood[4] = sampleCascadeTexel(coord, int32_t2(0, 0), cascade, cascadeIndex);
neighbourhood[5] = sampleCascadeTexel(coord, int32_t2(1, 0), cascade, cascadeIndex);
neighbourhood[6] = sampleCascadeTexel(coord, int32_t2(-1, 1), cascade, cascadeIndex);
neighbourhood[7] = sampleCascadeTexel(coord, int32_t2(0, 1), cascade, cascadeIndex);
neighbourhood[8] = sampleCascadeTexel(coord, int32_t2(1, 1), cascade, cascadeIndex);

// numerical robustness
float32_t3 excl_hood_sum = ((neighbourhood[0] + neighbourhood[1]) + (neighbourhood[2] + neighbourhood[3])) +
((neighbourhood[5] + neighbourhood[6]) + (neighbourhood[7] + neighbourhood[8]));

CascadeSample retval;
retval.centerValue = neighbourhood[4];
retval.normalizedNeighbourhoodAverageLuma = retval.normalizedCenterLuma = calcLuma(neighbourhood[4]) * reciprocalBaseI;
retval.normalizedNeighbourhoodAverageLuma = (calcLuma(excl_hood_sum) * reciprocalBaseI + retval.normalizedNeighbourhoodAverageLuma) / 9.f;
return retval;
}

} // namespace impl

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of using an impl namespace, roll this up into a struct

Copy link
Member

@devshgraphicsprogramming devshgraphicsprogramming Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could roll it up into a ResolveAccessorAdaptor (and also make a ResolveAccessor concept which is LoadableImage + the extra methods like calcLuma and the get with an offset for a particular cascade which does bounds checking)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

general note, you should allow using any vector<scalar,N> for the color values (the default adaptor can be hardcoded for vector<scalar,3>)

Comment on lines +1 to +2
#ifndef _NBL_BUILTIN_HLSL_RWMC_RWMC_HLSL_INCLUDED_
#define _NBL_BUILTIN_HLSL_RWMC_RWMC_HLSL_INCLUDED_

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this header very clearly deals with Resolving, so call it resolve.hlsl not rwmc.hlsl

Comment on lines +23 to +32
// TODO: figure out what values should pixels outside have, 0.0f is incorrect
float32_t3 sampleCascadeTexel(int32_t2 currentCoord, int32_t2 offset, in RWTexture2DArray<float32_t4> cascade, uint32_t cascadeIndex)
{
const int32_t2 texelCoord = currentCoord + offset;
if (any(texelCoord < int32_t2(0, 0)))
return float32_t3(0.0f, 0.0f, 0.0f);

float32_t4 output = cascade.Load(int32_t3(texelCoord, int32_t(cascadeIndex)));
return float32_t3(output.r, output.g, output.b);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no weird input textures, make this an Accessor Adaptor (does out of bound checks) which requires concepts::accessors::LoadableImage

btw nothe that the names of the concepts need fixing https://github.com/Devsh-Graphics-Programming/Nabla/blob/aa5d6cb90b383e68b52c6abe69c6efe96281c7dd/include/nbl/builtin/hlsl/concepts/accessors/loadable_image.hlsl

I copy pasted and they have Storable in the name for this reason, should be Loadable

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rwmc::ResolveAccessorAdaptor should fullfill LoadableImage itself

Comment on lines +39 to +50
CascadeSample SampleCascade(in int32_t2 coord, in RWTexture2DArray<float32_t4> cascade, in uint cascadeIndex, in float reciprocalBaseI)
{
float32_t3 neighbourhood[9];
neighbourhood[0] = sampleCascadeTexel(coord, int32_t2(-1, -1), cascade, cascadeIndex);
neighbourhood[1] = sampleCascadeTexel(coord, int32_t2(0, -1), cascade, cascadeIndex);
neighbourhood[2] = sampleCascadeTexel(coord, int32_t2(1, -1), cascade, cascadeIndex);
neighbourhood[3] = sampleCascadeTexel(coord, int32_t2(-1, 0), cascade, cascadeIndex);
neighbourhood[4] = sampleCascadeTexel(coord, int32_t2(0, 0), cascade, cascadeIndex);
neighbourhood[5] = sampleCascadeTexel(coord, int32_t2(1, 0), cascade, cascadeIndex);
neighbourhood[6] = sampleCascadeTexel(coord, int32_t2(-1, 1), cascade, cascadeIndex);
neighbourhood[7] = sampleCascadeTexel(coord, int32_t2(0, 1), cascade, cascadeIndex);
neighbourhood[8] = sampleCascadeTexel(coord, int32_t2(1, 1), cascade, cascadeIndex);
Copy link
Member

@devshgraphicsprogramming devshgraphicsprogramming Oct 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SampleCascade should be a method of your ResolveAccessor (default impl and concept)
Actually should be a pseudo-private method of a Resolve Struct

sampleCascadeTexel can be a pseudo private __sampleCascadeTexel in the ResolveAccessorAdaptor (the default impl)

Comment on lines +27 to +28
if (any(texelCoord < int32_t2(0, 0)))
return float32_t3(0.0f, 0.0f, 0.0f);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't check for being out of bounds with texelCoord+1, you're only protecting against texelCoord-1

float normalizedNeighbourhoodAverageLuma;
};

// TODO: figure out what values should pixels outside have, 0.0f is incorrect

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you use accessors, you make it user-configurable, all that you can write is that MIRROR and CLAMP will falsely make similar neighbour pixels reducing RWMC effect on the border, while CLAMP_TO_BORDER (returning 0) will boost the RWMC effect because samples think they have more 0 neighbours

You usually want vignetting, so what you're doing now (CLAMP_TO_BORDER with black border) is probably desirable

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wirte it all in a comment


} // namespace impl

struct ReweightingParameters

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to ResolveParameters

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also divide into two structs

ResolveConstantParams - things that can't change during rendering, like:

  • last Cascade Index
  • reciprocalBase

and ResolveRuntimeParams which are things you can either change with imgui slider or changes with accumulation

  • initialEmin
  • rcplN
  • rcpKappa

so ResolvePAramters is two of those structs plus any free floating dependent variables like colorReliabilityFactor and NoverKappa

Comment on lines +93 to +94
float32_t3 reweight(in ReweightingParameters params, in RWTexture2DArray<float32_t4> cascade, in int32_t2 coord)
{

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make this into a Resolve struct templated on the Accessor you'll use instead of RWTexture2DArray (which really should only be read only)

you Resolve::create from the ResolveParameters and you have an operator()(NBL_REF_ARG(CascadeAccessor) acc, const int32_t2 coord) which does the reweighting

Comment on lines +95 to +98
float reciprocalBaseI = 1.f;
impl::CascadeSample curr = impl::SampleCascade(coord, cascade, 0u, reciprocalBaseI);

float32_t3 accumulation = float32_t3(0.0f, 0.0f, 0.0f);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to template everything on the vector type and its scalar

Comment on lines +393 to +394
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "hlsl/rwmc/rwmc.hlsl")
LIST_BUILTIN_RESOURCE(NBL_RESOURCES_TO_EMBED "hlsl/rwmc/CascadeAccumulator.hlsl")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to separate parameter structs passed and initialized between c++ and HLSL from the HLSL algorithms for splatting and resolve, so more headers

Chop it up!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants