Skip to content

Conversation

cetagostini
Copy link

@cetagostini cetagostini commented May 24, 2025

PR Description:

This PR introduces Bayesian Structural Time Series (BSTS) modeling to CausalPy, enabling advanced time series analysis, forecasting, and causal impact estimation. Inspired by principles from structural time series modeling basic with only trend/seasonal components (e.g., TensorFlow Probability's STS first example), this work decomposes time series into trend, seasonality, and optional regressor components within a Bayesian framework.

Key Changes:

  1. BayesianStructuralTimeSeries Model (causalpy/pymc_models.py):

    • New PyMC model for BSTS. Defaults to pymc-marketing components (LinearTrend, YearlyFourier), supports custom components, and optional exogenous regressors (X).
    • Adapted _data_setter, predict, score, and fit for time-dependent forecasting and ensuring mu (sum of components) is sampled.
  2. StructuredTimeSeries Experiment (causalpy/experiments/structured_time_series.py):

    • New experiment class for causal inference with BSTS.
    • Handles DatetimeIndex data, treatment_time, and patsy formulas for exogenous regressors.
      • Intercept Logic: Correctly manages formulas like y ~ 0 or y ~ 1 by ensuring the BSTS model's trend component provides the baseline, avoiding redundant patsy-generated intercepts for exogenous regressors.
    • Provides summary(), plot() (3-panel: fit/counterfactual, pointwise impact, cumulative impact), and get_plot_data().
  3. Testing & Integration:

    • Added comprehensive integration tests (test_bayesian_structural_time_series) covering various scenarios, including custom components and error handling.
    • Fixed a plotting issue in plot_xY (causalpy/plot_utils.py).
  4. API & Docs:

    • StructuredTimeSeries available as causalpy.StructuredTimeSeries.
    • Added backward-compatible wrapper in causalpy.pymc_experiments.py.
    • Updated relevant docstrings.

📚 Documentation preview 📚: https://causalpy--473.org.readthedocs.build/en/473/

@cetagostini cetagostini requested a review from drbenvincent May 24, 2025 12:22
@cetagostini cetagostini self-assigned this May 24, 2025
@cetagostini cetagostini added documentation Improvements or additions to documentation enhancement New feature or request labels May 24, 2025
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@cetagostini
Copy link
Author

Thinking to bring HSGP to here, and be able to replicate the second example.

@drbenvincent
Copy link
Collaborator

WOAH!

Ultra quick review / questions until I've got some more time:

  • Is this actually BSTS? It doesn't seem the use the BSTS stuff from @jessegrabowski in pymc-extras. If it's leaning heavily on linear trend and Fourier bases (from pymc-marketing, is this more like a prophet approach? No autoregressive components as far as I can tell? Still would be very very useful and an advancement on what we have so far.
  • Not able to take a deep dive at this point, but my hope in terms of the API would be that we could use the existing InterruptedTimeSeries experiment class and simply inject a new time series based pymc model class. Will aim to look into this in more detail - if there was a reason for a new experiment class then we can look into whether we can adapt it to make it more general.
  • Need to edit docs/source/notebooks/index.md to get the new notebook to render in the docs.
  • The new notebook is a great start. Could be good to compare the previous approach "y ~ 1 + t + C(month)" in its_pymc.ipynb to see how things differ.

Sorry for the relatively superficial review at this point - family weekend time. Will enjoy diving into the details soon.

@cetagostini
Copy link
Author

Hey @drbenvincent

Still BSTS, the Structural time series (STS) models are a family of probability models for time series that includes and generalizes many standard time-series modeling ideas, including:

  • autoregressive processes,
  • moving averages,
  • local linear trends,
  • seasonality, and regression and variable selection on external covariates (other time series potentially related to the series of interest).

Definitely, it's handy and popular to talk about state-space, but my understanding is that BSTS in the broad sense does not require a true state‐space; it merely requires a Bayesian, additive decomposition into trend, seasonality, and optional regressors. Unless you insist on the classic DLM/state‐space definition from a Kalman‐Filter sense.

In the Google blog the first model its very similar to this, the only difference is that the slope from the linear trend is "evolving slowly over time following a random walk".

Thanks to the HSGP class, we can add something similar if we want to.

@cetagostini
Copy link
Author

Not able to take a deep dive at this point, but my hope in terms of the API would be that we could use the existing InterruptedTimeSeries experiment class and simply inject a new time series based pymc model class. Will aim to look into this in more detail - if there was a reason for a new experiment class then we can look into whether we can adapt it to make it more general.

Happy to do it, my only opinion is that the API for the new class is quite different and allow for arbitrary components for trend and seasonality. IF they follow pymc-marketing protocols.

I'm doing that because will help me to easily input HSGP and replace this fourier bases approach (just as example), for the same structural time series. Could probably help to just replace by the state-space.

I'll be working on that as well :) I feel should be easy to do.

@cetagostini
Copy link
Author

The new notebook is a great start. Could be good to compare the previous approach "y ~ 1 + t + C(month)" in its_pymc.ipynb to see how things differ.

Almost the same. The bsts fits more during training (high R2) and has less variance (lower sd).

@drbenvincent
Copy link
Collaborator

Nice. So I am a beginner in time series modelling - which is why I've not given this a go yet. But that's a good clarification BSTS != state space.

Do you think your implementation leaves the door open for extension into autoregressive components etc?

@cetagostini
Copy link
Author

Nice. So I am a beginner in time series modelling - which is why I've not given this a go yet. But that's a good clarification BSTS != state space.

Do you think your implementation leaves the door open for extension into autoregressive components etc?

Indeed, I'll try to bring an example tomorrow!

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Firstly, this is very cool! Will probably need a few rounds of review to get this merged, but it will happen :)

Let's try to make it fit into the current docs and codebase as seamlessly as possible. So far we've got a nice distinction between "experiments" and "models".

The new model class BayesianStructuralTimeSeries fits in with the current structure very nicely. Given the range of experiments we've got so far the obvious candidate is to work with the interrupted time series experiment, and maybe multi-observation differences in differences. So the natural way to make this work would be to make the InterruptedTimeSeries class accept your new BayesianStructuralTimeSeries class. That would be in an ideal world, but let's see what we can do quickly to move in that direction.

The API differences between InterruptedTimeSeries and StructuredTimeSeries are minimal. Although obviously what is done with the provided data diverges a lot. Trying to resolve this in one go might be a bit much, but at the moment I've got these concrete suggestions:

  • Move the new StructuredTimeSeries class into interrupted_time_series.py and delete structured_time_series.py.
  • I relatively strongly think we should avoid the duplication of the plotting code. Like I say, we're trying to make the interrupted time series experiment work with a BSTS model, I don't think there is any reason why any custom plot logic is needed. We could extract the plot code into a mixin class (called something like ITSPlotter) which can be injected into both StructuredTimeSeries and InterruptedTimeSeries.

I think this already goes a long way towards keeping the nice distinction between experiment classes and model classes. The PR is already not far off that - the main issue I see is that StructuredTimeSeries is currently treated like a new experiment when in fact it's exactly the same as interrupted time series, but just separate in order to handle the different model input.

The new notebook is clearly applied to the interrupted time series experiment, so I think the notebook contents can be appended to the existing its_pymc.ipynb. That could be quite nice - the notebook would then be a kind of "how to do ITS from simple linear model to proper time series modeling".

@AlexAndorra
Copy link

This looks really cool and useful!!
Just dropping in as I see you guys are thinking of making this broader, with HSGP and broader TS decomposition (which I love!): any reason you're not plugging into pymc-extras.statespace? It should already have everything you need, and you can offload the model dev to it

@drbenvincent
Copy link
Collaborator

any reason you're not plugging into pymc-extras.statespace

Thanks @AlexAndorra - this is the direction I was hoping we'd go in. I've done some basic experimentation with it, but I'm a time series newbie and am not confident enough to know when I need to use what component, how to deal with non-stationarity etc. There's a lot more lower hanging fruit for me to work on in CausalPy, so I'm happy that this has been dropped on my lap by @cetagostini :)

I have a very crude understanding of how BSTS and state space approaches relate to each other (though @cetagostini explained it a bit above).

Would it make sense to end up having both this kind of BSTS model class and (eventually) the pymc-extras state space stuff? If that is silly then I guess the proposed implementation could be over-written by a state space implementation? Very happy to take guidance on this.

@drbenvincent
Copy link
Collaborator

@cetagostini The BayesianStructuralTimeSeries.build_model method seems like a reasonably simple bit of pymc code. Though as far as I can tell, this seems to be build as an out of the box solution. Nothing wrong with that, but the TFP blog post you linked has a very modular API where users can build their own model from components.

I don't think having a fixed out of the box BSTS model is bad, and in some ways it is good because it is what it is and doesn't require a user to meddle much. That said, it could be pretty cool to think about an API where users can modularly build up a pymc model to pass into the experiment class. Though I imagine this would probably be a considerable bit more work?

@AlexAndorra
Copy link

Yeah, the statespace module has a submodule unlocking STS. I think it's doing everything you guys are trying to do here. The one thing we're missing (and actively working on; almost done!) is vectorizing the Kalman filter, to allow for batched time series

@cetagostini
Copy link
Author

cetagostini commented May 26, 2025

This looks really cool and useful!!
Just dropping in as I see you guys are thinking of making this broader, with HSGP and broader TS decomposition (which I love!): any reason you're not plugging into pymc-extras.statespace? It should already have everything you need, and you can offload the model dev to it

I'll take a look here! @AlexAndorra Thanks for jumping!

@cetagostini The BayesianStructuralTimeSeries.build_model method seems like a reasonably simple bit of pymc code. Though as far as I can tell, this seems to be build as an out of the box solution. Nothing wrong with that, but the TFP blog post you linked has a very modular API where users can build their own model from components.

I'm already thinking no this, thats why I left the parameters trend_component and seasonal_component with a simple wrapper with a similar pymc-marketing signature, then should be easy to add them and replace by a state-space trend or seasonal.

That said, it could be pretty cool to think about an API where users can modularly build up a pymc model to pass into the experiment class

Thats cool, and seeing signatures, I feel could be. But not sure if for a first iteration? My guess, better to make a class that can manage different stuff like non state-spaces, and state spaces type of models, then we can move forward in order PR and say, lets bring any pymc time series model, and thats it. What do you think @drbenvincent ?

@drbenvincent
Copy link
Collaborator

Sure @cetagostini I'm happy with an iterative approach. Happy to proceed along the lines in my first review.

@cetagostini
Copy link
Author

@drbenvincent

Apply the changes:

  • Under experiements, change Interrupted Time Series name to Structural Time Series, more generic and adhoc to get Interrupted time and basis expansion time series. This will help to hold the same structural time series class able to get StateSpaceTimeSeries, BasisExpansionTimeSeries or InterruptedTimeSeries. All under same family.
  • Keep interrupted time series signatures to backward compatibility.
  • Move the example under the same ITS notebook.
  • Avoid duplication of plotting and stuff because we rely on the same class.

@cetagostini
Copy link
Author

@drbenvincent Implementing the StateSpace it's taking more effort than expected, will continue during week... But I think, it requires probably a following PR to not make this massive, instead of all in one.. Not sure, take a look and give me feedback. I already add a class draft in the notebook but need more work to be able to be used by the new experiment STS class.

@cetagostini
Copy link
Author

@drbenvincent

integration with state space ready, took more than expected because the state-spaces needed a few wrappers.... Can you modify and give me hand with the docs? All in my local looks okay, not sure whats missing here!

Currently:

  • Test are done.
  • Backward compatible.
  • Support for BSTS (state and non state spaces)

Important

We are using y_hat and mu in different places, I think, y_hat should be the one for the plots, right?

@drbenvincent
Copy link
Collaborator

Doc test failing for StructuralTimeSeries. Currently that docstring uses the LinearRegression model, which I guess is not a doable experiment/model combination? Do just need to replace with the right model I assume?

\text{seasonality} &\sim \text{YearlyFourier}(...) \\
\beta &\sim \mathrm{Normal}(0, \sigma_{\beta}) \quad \text{(if X is provided)} \\
\sigma &\sim \mathrm{HalfNormal}(\sigma_{err}) \\
\mu &= \text{trend_component} + \text{seasonality_component} [+ X \cdot \beta] \\
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line in the docstring is confusing. That + in the square brackets.

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Could add pymc-extras to the autodoc_mock_imports in conf.py. Should help with autodoc when we refer to the time series stuff in pymc-extras. But consider as optional - can come back to polish this in another small and focussed PR.
  • Mentioned elsewhere that it would be super cool to get more intro information in the docs about time series modelling. Maybe it would be a lower bar to just expand on this in the docstrings of the new classes?

@NathanielF
Copy link
Contributor

Don't have complete familiarity with this PR, but just on the dependency question. For the IV experiment class, we recommended posterior predictive sampling with numpyro rather than the baseline pymc because MvNormal ppc was so slow.

I think recommending a sampler as an optional dependency seems more reasonable than a whole package....

Not saying we shouldn't but pymc-marketing is huge. Would avoid inflating dependencies if we could avoid it...

@drbenvincent
Copy link
Collaborator

On the desired merge StructuralTimeSeries into InterruptedTimeSeries

Bear in mind that since this PR was started all experiment classes (including InterruptedTimeSeries) has embraced using xarray data structures, rather than numpy. So at the very least, we need to embrace that change. Not looked in detail, but that might also eliminate some of the difference between the two classes? Could be much easier with some LLM help on that front. I can give it a go if you're not keen @cetagostini?

Looking at the diff between the two, I think it's doable. Most of the differences related to the numpy -> xarray change.

@cetagostini
Copy link
Author

@drbenvincent and @NathanielF I did pymc-extras and pymc-marketing optional dependencies with lazy imports. On top I merge ITS with the old STS and then delete the STS.

I move the notebook to a new draft folder, only to keep it, but we need to work in the notebooks in the future. Maybe another PR?

Copy link

codecov bot commented Aug 11, 2025

Codecov Report

❌ Patch coverage is 63.48684% with 222 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.26%. Comparing base (673e805) to head (b5ec091).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
causalpy/pymc_models.py 54.16% 132 Missing ⚠️
causalpy/tests/test_integration_pymc_examples.py 52.55% 65 Missing ⚠️
...salpy/tests/test_integration_its_new_timeseries.py 58.33% 20 Missing ⚠️
causalpy/experiments/interrupted_time_series.py 96.74% 4 Missing ⚠️
causalpy/plot_utils.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #473      +/-   ##
==========================================
- Coverage   95.43%   89.26%   -6.17%     
==========================================
  Files          28       30       +2     
  Lines        2586     3168     +582     
==========================================
+ Hits         2468     2828     +360     
- Misses        118      340     +222     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cetagostini-wise
Copy link

@drbenvincent @NathanielF any feedback?

@NathanielF
Copy link
Contributor

Sorry @cetagostini will try to review this evening. Been a crazy few weeks.

@NathanielF
Copy link
Contributor

Sorry @cetagostini on this, this week!

Copy link
Contributor

@NathanielF NathanielF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it took me so long. There is some really good and well developed functionality here. I agree we should merge it and maybe re-visit some things after e.g. the intervention stuff goes in, and the Prior class becomes available.

Nice job @cetagostini

if not request.config.getoption("--doctest-modules", default=False):
return

if not _HAVE_PYMC_TESTING or mock_sample is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good to have.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename the file i guess.

Notebook looks good, re: three examples.

Maybe do some more explanatory writing around this comment

Exogenous regressors are optional

If that's the framing, isn't it cool that we can absorb some of this variation into really compelling latent components? Be nice to motivate it that way...maybe?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice., Yeah, move the notebooks out of draft. Add to the How to sections and maybe add a little story telling of how adding structure is a good example of bayesian workflow. Y

n_order : int, optional
The number of Fourier components for the yearly seasonality. Defaults to 3.
Only used if seasonality_component is None.
n_changepoints_trend : int, optional
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking but i think after we merge this and @JeanVanDyk 's intervention detection module we should think about change points more holistically... to see if there are any options for consolidation/streamlining

from pymc_marketing.mmm import LinearTrend
except ImportError as err:
raise ImportError(
"BayesianBasisExpansionTimeSeries requires pymc-marketing when default trend "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. This is a nice way of doing it.

dims="obs_ind",
)

# Get validated components (no more ugly imports in build_model!)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

mu_ = mu_ + pm.math.dot(X_data, beta)

# Make mu_ an explicit deterministic variable named "mu"
mu = pm.Deterministic("mu", mu_, dims="obs_ind")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. This is neat.

with pm.Model(coords=coordinates) as self.second_model:
# Add coords for statespace (includes 'time' and 'state' dims)
P0_diag = pm.Gamma("P0_diag", alpha=2, beta=1, dims=P0_dims[0])
_P0 = pm.Deterministic("P0", pt.diag(P0_diag), dims=P0_dims)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiosity, can these priors be changed easily, like would the use the prior class allow us to swap them in and out e.g. sigma = 80 in ZeroSumNormal... is there a reason we want fixed at that magnitude?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I'll make an issue and get familiar how we are allowing Prior class to allow to modify!


class StateSpaceTimeSeries(PyMCModel):
"""
State-space time series model using pymc_extras.statespace.structural.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe link to some @jessegrabowski 's more developed notebooks or docs for the interested reader to find out more about statespace generally

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

@cetagostini
Copy link
Author

Thanks @NathanielF I'll be working on this during the week to finish all before Friday!

@cetagostini
Copy link
Author

@drbenvincent could you give an stamp here? @NathanielF already did, and everything is in place to get the part started 🙌🏻

@drbenvincent
Copy link
Collaborator

@cetagostini Turns out I will have some capacity this weekend! I'll aim to get some CausalPy progress made.

@drbenvincent drbenvincent requested a review from Copilot October 11, 2025 12:28
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements Bayesian Structural Time Series (BSTS) modeling for CausalPy, providing advanced time series analysis and forecasting capabilities. The implementation includes two new PyMC models and integrates them with the existing experiment framework.

  • Adds two new time series models: BayesianBasisExpansionTimeSeries and StateSpaceTimeSeries
  • Introduces comprehensive integration for BSTS models within the InterruptedTimeSeries experiment framework
  • Updates plotting utilities to handle new model types and improves compatibility across different model outputs

Reviewed Changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
pyproject.toml Adds pymc-marketing dependency for BSTS components
docs/source/conf.py Adds pymc-extras to documentation requirements
causalpy/tests/test_integration_pymc_examples.py Adds comprehensive integration tests for new BSTS and state-space models
causalpy/tests/test_integration_its_new_timeseries.py New test file for ITS integration with new time series models
causalpy/tests/conftest.py Adds compatibility handling for PyMC testing helpers
causalpy/pymc_models.py Implements BayesianBasisExpansionTimeSeries and StateSpaceTimeSeries models
causalpy/plot_utils.py Fixes plotting issue with fill_kwargs handling
causalpy/experiments/interrupted_time_series.py Updates ITS experiment to support BSTS models
causalpy/experiments/init.py Adds experiment imports to module

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +1183 to +1185
print(
f"Warning: Discrepancy in 'coeffs'. Using derived: {self._exog_var_names} over input: {model_coords['coeffs']}"
)
Copy link

Copilot AI Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using print() for warnings is not recommended. Consider using Python's warnings module with warnings.warn() for proper warning handling that can be controlled by users.

Copilot uses AI. Check for mistakes.

Comment on lines +1363 to +1366
print(
"Warning: X_pred provided exogenous variables, but the model was not "
"built with exogenous variables. These will be ignored."
)
Copy link

Copilot AI Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using print() for warnings is not recommended. Consider using Python's warnings module with warnings.warn() for proper warning handling that can be controlled by users.

Copilot uses AI. Check for mistakes.

Comment on lines +1373 to +1375
print(
"Warning: Model expects no exog vars, but X_exog_pred_vals has columns. Forcing to 0 columns."
)
Copy link

Copilot AI Oct 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using print() for warnings is not recommended. Consider using Python's warnings module with warnings.warn() for proper warning handling that can be controlled by users.

Copilot uses AI. Check for mistakes.

Copy link
Collaborator

@drbenvincent drbenvincent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments/thoughts from me in conjunction with LLM. Focus on maintainability etc...

Enhance feature visibility

  1. Can I request that you split the new notebook content into a new notebook. That way we'll have much greater visibility of this awesome functionality. As in still in the Interrupted Time Series section in the section navigation, but a new notebook in there.

  2. Please update the features table on the docs home page. This table is focussed on experiment designs, so maybe we just add text about the availability of BSTS models under the existing ITS row.

Breaking Change to Base Class Signature

Changing base PyMCModel.predict signature breaks the Liskov Substitution Principle. The base class now has parameters it doesn't use, added solely to accommodate subclasses. The **kwargs is particularly problematic as it's a code smell indicating unclear interface design.

Better approach: The BSTS models should override these methods with their own signatures rather than polluting the base class.

Repetitive isinstance Checks Throughout InterruptedTimeSeries

The experiment file has 5 identical checks for is_bsts_like:

is_bsts_like = isinstance(
    self.model, (BayesianBasisExpansionTimeSeries, StateSpaceTimeSeries)
)

Problem: This violates DRY (Don't Repeat Yourself) and makes the code hard to maintain. Every method that interacts with the model has branching logic.

Better approach:

  • Use polymorphism - BSTS models should conform to the existing interface
  • Or create a helper method to abstract the data transformation
  • Or use a strategy pattern/adapter pattern

Create adapter pattern: Add a BSTSAdapter or TimeSeriesDataAdapter class to handle data format conversions

Overly Complex Coordinate Handling in BSTS Models

The _prepare_time_and_exog_features() and build_model() methods have excessive complexity handling edge cases:

# Ensure exog_names_from_coords is a list for internal processing
processed_exog_names = []
if exog_names_from_coords is not None:
    if isinstance(exog_names_from_coords, str):
        processed_exog_names = [exog_names_from_coords]
    elif isinstance(exog_names_from_coords, (list, tuple)):
        processed_exog_names = list(exog_names_from_coords)
    else:
        raise TypeError(...)

Then later there's complex logic to reconcile model_coords, self._exog_var_names, and input coords. Lines 1150-1200 are particularly convoluted.

Problem: Too many code paths, unclear responsibilities, verbose comments explaining the complexity. The code includes comments like "Logic error: processed_exog_names should be set..." which suggests defensive programming against the code's own complexity.

Better approach: Simplify the interface - require consistent input formats, validate early, fail fast.

Missing pymc-marketing in Mock Imports

The PR adds pymc-marketing as a required dependency but the Sphinx autodoc_mock_imports list was updated to include only pymc-extras. Looking at the conf.py, I don't see pymc-marketing in the mock imports.

Problem: Documentation build might fail if pymc-marketing isn't installed or has import issues.

Complex Plotting Logic with Defensive Checks

Lines 300-450 in interrupted_time_series.py have extensive defensive checks:

pre_mu_plot = (
    pre_mu.isel(treated_units=0) if "treated_units" in pre_mu.dims else pre_mu
)

This pattern repeats ~10 times in the plotting code.

Problem: The plot method shouldn't need to handle both cases - this suggests the data format should be normalized earlier.

elif "r2" in self.score.index:
r2_val = self.score["r2"]
r2_std_val = self.score.get("r2_std", None)
except Exception:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific about the exception type

from pymc.testing import mock_sample, mock_sample_setup_and_teardown # type: ignore

_HAVE_PYMC_TESTING = True
except Exception: # pragma: no cover
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more specific about the exception type

@cetagostini
Copy link
Author

cetagostini commented Oct 11, 2025

Hey @drbenvincent agree with your comments!

I know currently design probably is not the most optimal, and we are missing the a particular better notebook, but as in other projects I was expecting to work incrementally here and not saturate PR. I'll move this to draft, and see if I can take it in the next weeks, otherwise, probably I'll take this quarter as feedback, close this PR and try to make another smaller PR applying the recommendations and see if we can move from there in the future.

@cetagostini cetagostini marked this pull request as draft October 11, 2025 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants