Skip to content

Conversation

brendad8
Copy link

Adds new functions for tunable parameters to support adding DBSCAN and GMMs to tidyclust, see tidymodels/tidyclust#209

@kbodwin

Merge commit '889d214e07889ad33e7f241a68b77af5fd5a81ac'

#Conflicts:
#	NAMESPACE
#	tests/testthat/test-params.R
@hfrick
Copy link
Member

hfrick commented Jun 20, 2025

Hi @brendad8 thanks for the PR! I've kicked off the actions on it but I'll hold of reviewing until Emil has had a chance to look at the tidyclust PR.

Copy link
Member

@hfrick hfrick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Could you please resolve the review comments directly on the code as well as

  • collect the parameters for a sepecifc engine into one file, similar to the existing R/param_engine_*.R files
  • fix the pkgdown action by adding the new parameters to _pkgdown.yml
  • add a NEWS bullet for your contribution in the style of https://style.tidyverse.org/news.html

Thanks for your contribution!

@hfrick
Copy link
Member

hfrick commented Jun 24, 2025

Note to self: run air over it before merging!

@hfrick
Copy link
Member

hfrick commented Jun 24, 2025

@brendad8 you can ignore the failing format-suggest action :)

@brendad8
Copy link
Author

Thanks for the feedback @hfrick! I was able to address a few of the comments and am still working on the rest.

@kbodwin
Copy link

kbodwin commented Jun 29, 2025

@hfrick can we get a little clarification on the param_engine_* suggestion? It seems that this is used for engine-specific params, but I wouldn't say that these ones are engine-specific.

For DBSCAN, the exposed parameters radius and min_points will need to be exposed for any engine implementing this method, and I'd anticipate other future methods needing one or both of these as well. With your approval, I'd suggest keeping these in their own param_* files.

For GMMs, the five shape/orientation parameters (circular, shared_orientation, shared_shape, shared_size, and zero_covariance) are more ambiguous. On the one hand, they are a little tailored towards the model options of mclust specifically, so I could see the value in moving these five to param_engine_mclust. On the other hand, future engines for GMM will need to use some or all of these - for example, in mixtools, the function mvnormalmixEM has parameters arbmean and arbvar which would be controlled by a particular combination of our params.

We're happy to defer to your preference, but wanted to make it clear that these are params for brand new specifications in tidyclust, not engine params for adding engines to an existing spec.

@hfrick
Copy link
Member

hfrick commented Jul 3, 2025

Ah, I see. Thanks for the additional context @kbodwin!

I can see how the reference to the engines made this a bit ambiguous. Where I'm coming from: the number of parameters in dials has grown quite a bit over time, so I think adding some more grouping is helpful. That grouping does not have to be engine-specific; here, it seems to make sense to collect the parameters for gm_clust() in one file and name it param-model-gm_clust.R.

Detail: I see in the tidyclust PR that gm_clust() also uses num_clusters(). Since it's also used elsewhere, I'd leave that in its own file, while circular(), shared_orientation(), shared_shape(), shared_size(), and zero_covariance() go into param-model-gm_clust.R.

I saw that you currently link them via @seealso. That is a nice idea! For the 5 parameter that go into one file, could you please document them together on one page (via @name and @rdname) instead of @seealso? That links them even closer together, which seems fine given that the dials documentation is kinda spare anyway.

@hfrick
Copy link
Member

hfrick commented Jul 3, 2025

@brendad8, could you also put references to functions in tidyclust in backticks only, rather than in []? The [] activate cross-referencing with tidyclust. That is generally nice but since dials does not depend on tidyclust, this then leads to a warning in R CMD check which we'd like to avoid. (Do keep the () you added in b5e2265, this is only about [] vs backticks.)

❯ checking Rd cross-references ... WARNING
  Missing link(s) in Rd file 'circular.Rd':
    ‘[tidyclust:gm_clust]{tidyclust::gm_clust()}’
  
  Missing link(s) in Rd file 'min_points.Rd':
    ‘[tidyclust:db_clust]{tidyclust::db_clust()}’
  
  Missing link(s) in Rd file 'radius.Rd':
    ‘[tidyclust:db_clust]{tidyclust::db_clust()}’
  
  Missing link(s) in Rd file 'shared_orientation.Rd':
    ‘[tidyclust:gm_clust]{tidyclust::gm_clust()}’
  
  Missing link(s) in Rd file 'shared_shape.Rd':
    ‘[tidyclust:gm_clust]{tidyclust::gm_clust()}’
  
  Missing link(s) in Rd file 'shared_size.Rd':
    ‘[tidyclust:gm_clust]{tidyclust::gm_clust()}’
  
  Missing link(s) in Rd file 'zero_covariance.Rd':
    ‘[tidyclust:gm_clust]{tidyclust::gm_clust()}’
  
  See section 'Cross-references' in the 'Writing R Extensions' manual.

@brendad8
Copy link
Author

brendad8 commented Jul 3, 2025

Thanks for the extra clarification @hfrick. Ill get working on these changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants