Skip to content

adding discrete option to ppc_rootogram #362

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

Conversation

behramulukir
Copy link
Collaborator

@behramulukir behramulukir commented Jul 30, 2025

In #360, I explained the rationale behind adding a new discrete option to the style argument of ppc_rootogram() to offer a new plot aesthetic which puts more emphasis on the discreteness of the data. This PR is about my work on that issue. The only difference between my implementation and what @TeemuSailynoja proposed is that I decided to use rhombuses instead of dots for Observed values for better distinction.

This implementation passes all existing tests, so I think the current style options for ppc_rootogram() remain unchanged.

  • Add discrete option to style argument
  • Implement new tests
  • Update the documentation

Examples

Here are some examples of how ppc_rootogram() works with discrete style.

y <- rpois(100, 20)
yrep <- matrix(rpois(10000, 20), ncol = 100)
color_scheme_set("brightblue")
ppc_rootogram(y, yrep, style = "discrete")
rootogram1
ppc_rootogram(y, yrep, style = "discrete", prob=0.7)
rootogram2
ppc_rootogram(y, yrep, style = "discrete", size=2)
rootogram3

@behramulukir behramulukir linked an issue Jul 30, 2025 that may be closed by this pull request
@behramulukir behramulukir self-assigned this Jul 30, 2025
@codecov-commenter
Copy link

codecov-commenter commented Jul 30, 2025

Codecov Report

❌ Patch coverage is 43.24324% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 98.24%. Comparing base (5372336) to head (a0034ad).

Files with missing lines Patch % Lines
R/ppc-discrete.R 43.24% 21 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #362      +/-   ##
==========================================
- Coverage   98.60%   98.24%   -0.37%     
==========================================
  Files          35       35              
  Lines        5661     5689      +28     
==========================================
+ Hits         5582     5589       +7     
- Misses         79      100      +21     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jgabry
Copy link
Member

jgabry commented Jul 30, 2025

Thanks @behramulukir. I like this new style. One question for now: should we give users control over the color of the observed points since these colors may not be optimal for all the available color schemes?

In most of the other plots the colors for everything come from the color scheme, but I see that it would be difficult to do that here if we want to really distinguish between observations that fall inside and outside the interval. We don't usually have arguments that control the colors, but I guess we could do that here.

Another option would be to use the shape of the point (instead of the color) to distinguish whether it falls inside and outside the interval and then use a color that comes from the chosen color scheme.

@behramulukir
Copy link
Collaborator Author

Yes @jgabry, I think it would be better to offer more colour options, which would be more inclusive. The current colour choices may be indistinguishable for some types of colour blindness. The only issue here is not confusing users about that colour control arguments can only be used with discrete style and not other styles. This can be achieved by correct documentation, but I need to think about it.

Distinguishing by the shape of the point also makes sense, but then we need to include both the inside interval and outside interval shapes in the legend to make things clear.

I'll think about which implementation is less confusing for the users and try out different implementations.

@behramulukir
Copy link
Collaborator Author

Based on our discussions with @TeemuSailynoja, we decided that it is a better choice to indicate observations within and out of the bounds by shapes of the points instead of colours, because as you said those colors should be available for each color scheme, however, adding a dedicated color argument that would only work if the style = "discrete" would be confusing for users in addition to the fact that bayesplot doesn't generally have a such argument. Instead, having determined shapes for observations within and out of the bounds and following the colour scheme chosen by the user for all points and lines seemed more logical.

However, there are nuances to these changes which we need to decide on before merging this PR to main.

Shapes

As we decided to indicate whether observed points fall into intervals or not, the first shape option that we picked was triangles. So, we used downward triangles for observation points that fall into intervals and upward triangles for observation points that fall outside of the intervals. Here is an example of it:

discrete_triangle

We also had the idea of using squares to indicate that observed points are within the intervals and rhombuses to indicate they are not. Here is an example of it:

discrete_square

We didn't settle on any of these options, so please tell me which one you think is better looking.

Colors

Another point of building this new style of rootograms was deciding on how to colour different elements. Since we decided not to use colour to indicate the position of observed points, all of the visual elements should follow from the colour scheme that users pick. To do that, there were two main paths: following from other rootogram styles (standing, hanging, suspended), where observed points are light and expected points are dark or following from PPC intervals, where observed points are dark and expected points are light.

The argument for following other rootogram styles, where observed points are light and expected points are dark, is that we have the same colouring practice for all kinds of rootograms. On the other hand, it is worth noting that discrete style is fairly different from other styles because other styles has bars for observed points which occupy considerable amount of the graph space and having them darker would make the graph visually heavy. Since this is not a case, we don't need to follow from other rootograms and instead follow ppc_intervals which has darker color for observed points and lighter color for expected points. The argument for this is ppc_intervals and ppc_rootogram(style="discrete") has more in common visually as both functions utilise geom_pointrange().

Here is the example for lighter observed - darker expected:

discrete_colornormal

Here is the example for darker observed - lighter expected:

discrete_colorreversed

One thing we also decided to change was removing the change in alpha of the lines where until the halfway, line had higher alpha and from halfway to the end of line it was lower alpha. This wasn't contributing much to user's understanding of the situation and adding another visual effect to a graph that already has lots of different visual cues. Removing this and having uniform alpha for the full lines made the graph easier to understand.

@jgabry
Copy link
Member

jgabry commented Jul 31, 2025

Thanks for the detailed summary. Here are a few comments:

  1. Shape: I don't have a strong preference between the triangles and the squares/rhombuses. Both seem good. If anyone else (you, Teemu, Aki, TJ, etc.) has a preference we can go with that.

  2. Color: I think your point about the similarity to ppc_intervals() is a good one. I would vote for copying ppc_intervals() (darker observed - lighter expected) instead of the other rootogram styles. Also, since we might want to eventually deprecate the other rootogram styles (right?), it makes less sense to copy them.

  3. Alpha: that change seems good to me.

@behramulukir
Copy link
Collaborator Author

Thanks for your comments, @jgabry!

I have a slight preference towards using squares and rhombuses instead of triangles, since the directions of triangles don't have a meaning on their own -nothing would change if we swap the directions- but it might signal users that there actually is a meaning, so if we use squares and rhombuses, we can avoid that. So, I'll go with them instead -unless anyone has an argument against it.

For the colour, I agree with your comment. I am also in favour of copying ppc_intervals() because of consistency, and also because it is easier to distinguish that way. Observed points are plotted on top of expected points, and it is easier to distinguish overlapping points in a darker foreground and a lighter background compared to a lighter foreground and a darker background. And yes, I think it makes sense to slowly depreciate other rootogram styles, so I can add a depreciation warning to my code in the next commit about this.

@aloctavodia
Copy link
Collaborator

I have a slight preference for circles (in), triangles (out). I think those are visually easier to differentiate. And having an option to enable/disable the distinction.
Another nitpick, the circle for the expected values should be on top of the line (as in plot_ppc_intervals), not behind (or it should be solid, alpha=1). Additionally, having the edge and face of the circle to be the same color as the line contributes to a plot that looks less cluttered.

@behramulukir
Copy link
Collaborator Author

To be honest, I am in favour of using circles only for expected points, as the colours we use for distinction are not completely different colours but shades of the same colour, which might be harder to distinguish for some people, and especially for people with visual impairments. I also think this can be a future update to ppc_intervals() to enhance its visuality.

I agree with your comment that the expected points should be on top of their lines. I checked ppc_intervals(), so my guess is in our case, having alpha = 1 would solve the issue, but anyways, I'll look into this point in detail.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Emphasizing discreteness of data in rootograms
4 participants